smasher125354 / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

robots.txt isn't crawled #335

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
when the robots.txt isn't plain/text it isn't crawled

Original issue reported on code.google.com by avrah...@gmail.com on 22 Jan 2015 at 4:12

GoogleCodeExporter commented 9 years ago
Fixed in Revision: 853afc5a5f13

Original comment by avrah...@gmail.com on 22 Jan 2015 at 4:59

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Why do we need to support crawling non-text robots.txt? Isn't that against the 
standard?

Original comment by ganjisaffar@gmail.com on 27 Jan 2015 at 6:25

GoogleCodeExporter commented 9 years ago
I asked in crawler-commons internal forum and Ken which is a real crawling 
expert responded that robots.txt is legit even as html/txt

here is his answer:
https://groups.google.com/forum/#!topic/crawler-commons/1yiK9l-uP0k

Original comment by avrah...@gmail.com on 27 Jan 2015 at 6:38