scrapy / protego

A pure-Python robots.txt parser with support for modern conventions.
BSD 3-Clause "New" or "Revised" License
54 stars 28 forks source link

Wrong handling of rule with wildcard in path #51

Open kox-solid opened 1 month ago

kox-solid commented 1 month ago

protego == 0.3.1

from protego import Protego

content = """
User-agent: *
Allow:    /*/filter/page=*/$
Disallow: /
"""
robots = Protego.parse(content)
user_agent = "mozilla"

url = "https://example.com/1/filter/page=5/"
print(robots.can_fetch(url, user_agent))

returns False instead of True