Hi,
Even if seeded with 5 domains or so, with numCrawlers set as 10, crawler4j crawls only one domain at a time. Given that politeness delay is about 1 to 5 seconds, thats only a few dozen pages per minute. Whereas, if it crawled different domains in parallel, pages per minute would go up rapidly.
Is there a reason why a single CrawlController doesn't crawl multiple domains in parallel?
Hi, Even if seeded with 5 domains or so, with numCrawlers set as 10, crawler4j crawls only one domain at a time. Given that politeness delay is about 1 to 5 seconds, thats only a few dozen pages per minute. Whereas, if it crawled different domains in parallel, pages per minute would go up rapidly.
Is there a reason why a single CrawlController doesn't crawl multiple domains in parallel?