Not obeying robots.txt - Githubissues

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. I got an email from a website that the crawler doesn't obey the robots.txt 
file..it's still crawling pages restricted in robots.txt. I do not know why 
that is. A similar issue was pointed out in an earlier version
2.
3.
What is the expected output? What do you see instead?

What version of the product are you using?
I am using version 3.3

Please provide any additional information below.

Original issue reported on code.google.com by ktar...@gmail.com on 13 Aug 2012 at 8:37

GoogleCodeExporter commented 9 years ago

Please supply the example URL so I can check for myself

Original comment by avrah...@gmail.com on 11 Aug 2014 at 2:09

GoogleCodeExporter commented 9 years ago

Closed due to inactivity and no good scenario

Original comment by avrah...@gmail.com on 23 Sep 2014 at 2:05

Changed state: Invalid

xrma / crawler4j

Not obeying robots.txt #168