mohankreddy / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

What is the maximum number of seeds that can be given? #32

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
I have a 50 million seeds, I am trying to add all of them to the Controller, 
and want to run multiple threads. Will this work? Assume that I have enough 
memory and cpu power.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?
2.6

Please provide any additional information below.

Original issue reported on code.google.com by aravin...@gmail.com on 29 Apr 2011 at 7:16

GoogleCodeExporter commented 9 years ago
I am assuming that you only want to download the content of these pages and 
don't really want to crawl hundreds of millions of pages starting from these 
seeds. If that is the case, it should work on a good enough computer. You 
should only make sure to disable the features that you don't need (like 
robots.txt parsing, ...)

-Yasser

Original comment by ganjisaffar@gmail.com on 29 Apr 2011 at 7:21