scrapy / protego

A pure-Python robots.txt parser with support for modern conventions.
BSD 3-Clause "New" or "Revised" License
54 stars 28 forks source link

Disallowing / does not work when the target URL path is missing #17

Closed Gallaecio closed 2 years ago

Gallaecio commented 2 years ago
>>> from protego import Protego
>>> robots_txt = "User-Agent: *\nDisallow: /\n"
>>> robots_txt_parser = Protego.parse(robots_txt)
>>> robots_txt_parser.can_fetch("http://example.com/", "mybot")
False
>>> robots_txt_parser.can_fetch("http://example.com", "mybot")
True
>>> 

Both calls should return False, since the / path is implicit if a URL has no path.