More documentation needed about the robots.txt protocol

josegicar commented 1 month ago

Summary

At the file "robotstxt.py", I recommend to add some more comments about how does the protocol works since it is not clear for some users.

Motivation

This suggestion was created so that people who read the robotstxt.py file know how it works and what the robots exclusion standard does. I took into account the issue #6244 where "mery16q" did not understand the robots protocol completely.

Describe alternatives you've considered

I did the pull request number 6287 where I added some comments at the start of the file on the route: “scrapy/downloadermiddlewares/robotstxt.py”. There you may understand how the robots.txt works.

wRAR commented 1 month ago

The docs already link to https://www.robotstxt.org/ , is that not enough?

josegicar commented 1 month ago

It could be better for users to look at it at the comments at the start of said file, not just putting the url :)

wRAR commented 1 month ago

The canonical place for the middleware documentation is the documentation, not the module docstrings.

scrapy / scrapy