Closed josegicar closed 1 month ago
The docs already link to https://www.robotstxt.org/ , is that not enough?
It could be better for users to look at it at the comments at the start of said file, not just putting the url :)
The canonical place for the middleware documentation is the documentation, not the module docstrings.
Summary
At the file "robotstxt.py", I recommend to add some more comments about how does the protocol works since it is not clear for some users.
Motivation
This suggestion was created so that people who read the robotstxt.py file know how it works and what the robots exclusion standard does. I took into account the issue #6244 where "mery16q" did not understand the robots protocol completely.
Describe alternatives you've considered
I did the pull request number 6287 where I added some comments at the start of the file on the route: “scrapy/downloadermiddlewares/robotstxt.py”. There you may understand how the robots.txt works.