monperrus / crawler-user-agents

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
MIT License
1.19k stars 254 forks source link

Escape dots in patterns #338

Closed giuscris closed 7 months ago

giuscris commented 10 months ago

Thank you @monperrus for this repository.

I've noticed that several patterns contain unescaped dots, causing false positives. I fixed these patterns and added a check to validate.py to ensure they're properly escaped.

monperrus commented 9 months ago

Thanks @giuscris

The one reason dot is good is that you can string-match with them, in case your platform has no support for regexp or you care for speed.

giuscris commented 9 months ago

Yes, of course you can string-match but some patterns are already regular expressions. It would be a partial match in any case.

monperrus commented 9 months ago

you're right. Maybe we should have pattern (mandatory) and pattern_regexp (optional) and change the CI validation script accordingly.

doublex commented 9 months ago

+1 Works for me

monperrus commented 7 months ago

thanks a lot @giuscris