scrapy / protego

A pure-Python robots.txt parser with support for modern conventions.
BSD 3-Clause "New" or "Revised" License
54 stars 28 forks source link

Use `pyre2` as optional dependency for RegExp speedup. #47

Open rtb-zla-karma opened 6 months ago

rtb-zla-karma commented 6 months ago

Just throwing up a far future idea.

I've seen that your lib is 40% slower compared to RobotFileParser from Python versions < 3.13 . I suspect this is because of re module compilation and matching.

pyre2 is a drop-in replacement for re which is faster for simple patterns which are exactly what robots.txt relies on. pyre2 falls back to re if it doesn't support some RegExp features (like lookarounds) but it won't be the case here.

My claims about potential speedup should be tested with your lib of course but nonetheless I think these are worth a consideration.