scrapy / protego

A pure-Python robots.txt parser with support for modern conventions.
BSD 3-Clause "New" or "Revised" License
54 stars 28 forks source link

Accept robots.txt as bytes #16

Closed Gallaecio closed 2 years ago

Gallaecio commented 2 years ago
>>> from protego import Protego
>>> robots_txt = b"User-Agent: *\nDisallow: /\n"
>>> robots_txt_parser = Protego.parse(robots_txt)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/adrian/temporal/venv/lib/python3.9/site-packages/protego.py", line 310, in parse
    o._parse_robotstxt(content)
  File "/home/adrian/temporal/venv/lib/python3.9/site-packages/protego.py", line 327, in _parse_robotstxt
    hash_pos = line.find('#')
TypeError: argument should be integer or bytes-like object, not 'str'
>>> robots_txt = "User-Agent: *\nDisallow: /\n"
>>> robots_txt_parser = Protego.parse(robots_txt)
>>>