rakanalh / grawler

A web crawler / scraper engine written in Golang
MIT License
30 stars 3 forks source link

Does follow the robots.txt policies and crawl delays? #1

Open LeMoussel opened 7 years ago

LeMoussel commented 7 years ago

Hi,

Does grawler follows the robots.txt policies and crawl delays ?

rakanalh commented 7 years ago

Hello @LeMoussel

I am not sure what you mean by crawl delays but for robots.txt the answer is not really.

LeMoussel commented 7 years ago

Crawl-delay, indicates the number of seconds for a crawler/spider to delay between requests.

robots.txt with Crawl-delay

User-agent: Googlebot
Crawl-delay: 20

User-agent: Slurp
Crawl-delay: 20

User-Agent: msnbot
Crawl-Delay: 20

Reference Yandex, Using robots.txt (https://yandex.com/support/webmaster/controlling-robot/robots-txt.xml)