scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
51.16k stars 10.35k forks source link

Explanation of the robots.txt exclusion standard in DownloaderMiddleware.robotstxt.py #6244

Closed mery16q closed 2 months ago

mery16q commented 2 months ago

In the project's code, in the part of downloaderMiddleware, there is a class called RobotsTxtMiddleware, which a first sight I did not know what it did. Then, after reading the documentation , I understood it, but it did not clarify me about the robots.txt exclusion standard. So, I suggest to put the explanation of that, like a comment in the code of the class, I mean, explaning what the class does and what is the robots.txt exclusion standard stand for.

wRAR commented 2 months ago

The docs already link to https://www.robotstxt.org/ , is that not enough?