temoto / robotstxt

The robots.txt exclusion protocol implementation for Go language
MIT License
269 stars 55 forks source link

Yet more files now parse #32

Closed DoryGuy closed 2 years ago

DoryGuy commented 2 years ago

More fixes to let more badly written robots.txt file get parsed with reasonable values.

DoryGuy commented 2 years ago

The key change is that anything valid before a user-agent is specified is assumed to belong to the group "*" or all user-agents. If you look at the robots.txt for officedepot.com you can see that's what they meant even if they can't read a spec.

temoto commented 2 years ago

Yes, understood, thank you. If you want something merged, please make it independent of other changes. Please don't make duplicate pull requests.