seantanwh / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Crawl Site Maps #304

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Is crawler4j able to crawl site maps?

Original issue reported on code.google.com by edgar.ri...@gmail.com on 6 Sep 2014 at 3:20

GoogleCodeExporter commented 8 years ago
Of course.

Just point it to a sitemap and it will crawl it.

You might need to enable binary content though.

But if you want to crawl only sitemaps I suggest using crawler-commons:
https://code.google.com/p/crawler-commons/

Original comment by avrah...@gmail.com on 6 Sep 2014 at 6:33

GoogleCodeExporter commented 8 years ago
Checked and it doesn't crawl links in Binary Data

Which means that it doesn't crawl Sitemaps

Original comment by avrah...@gmail.com on 18 Sep 2014 at 4:09

GoogleCodeExporter commented 8 years ago
Fixed in rev: 1ac149397bef  

Now all Sitemaps are crawlable

Original comment by avrah...@gmail.com on 23 Sep 2014 at 11:13

GoogleCodeExporter commented 8 years ago
Issue 314 has been merged into this issue.

Original comment by avrah...@gmail.com on 10 Oct 2014 at 9:45