temoto / robotstxt

The robots.txt exclusion protocol implementation for Go language
MIT License
269 stars 55 forks source link

List allow & disallow #38

Open TheUltimateCookie opened 2 years ago

TheUltimateCookie commented 2 years ago

Is it currently possible to just list allow and disallow paths along with their user agent without specifying a particular user agent?

temoto commented 2 years ago

Duplicates https://github.com/temoto/robotstxt/pull/26

Right now there is no public API to read parsed rules.

Please describe (best in pseudo-code) how you would use it.

TheUltimateCookie commented 2 years ago

This is part of a large web scraping process. Some of our clients have large robots.txt with many paths disallowed so we needed to know that before scraping started and for other SEO activities

Example: https://plantx.com/robots.txt