taganaka / polipus

Polipus: distributed and scalable web-crawler framework
MIT License
92 stars 32 forks source link

Support for robots.txt #30

Closed taganaka closed 10 years ago

taganaka commented 10 years ago

If enabled it will obey to the robots.txt directive. If the UserAgent used is not whitelisted, polipus will refuse to follow urls.

Fancy robots.txt directives such as delay are not supported

coveralls commented 10 years ago

Coverage Status

Coverage decreased (-0.05%) when pulling 8c7d68d0f7b101aa03572f6ef1c6b39e400080fa on robotstxt into 637c70c3ccc80e73282a59ca702067d29d18234e on master.

coveralls commented 10 years ago

Coverage Status

Coverage decreased (-0.07%) when pulling bf74752575b731c041d9c094efa8816cb5a2a1d1 on robotstxt into 9e1fe6e8a6fa853c515d14257b7858aabe2ff19f on master.

ABrisset commented 10 years ago

A big thank you for that ! Can't wait to test it .