yasserg / crawler4j

Open Source Web Crawler for Java
Apache License 2.0
4.56k stars 1.93k forks source link

Even if seeded with different domains, crawler4j crawls one domain at a time #450

Open theomails opened 4 years ago

theomails commented 4 years ago

Hi, Even if seeded with 5 domains or so, with numCrawlers set as 10, crawler4j crawls only one domain at a time. Given that politeness delay is about 1 to 5 seconds, thats only a few dozen pages per minute. Whereas, if it crawled different domains in parallel, pages per minute would go up rapidly.

Is there a reason why a single CrawlController doesn't crawl multiple domains in parallel?