Crawler4j missing more control over retry count

What steps will reproduce the problem?
1. Run the Basic Crawler with RobotServer enabled
2. Have "addeasy.netfirms.com" as the seed

What is the expected output? What do you see instead?
Expectation: Should be responsive.
Current Outcome: It gets blocked for so long in addSeed.

What version of the product are you using?
All versions

Please provide any additional information below.

In my crawler framework i use crawler4j. Recently it got a big head ache with 
the domain "addeasy.netfirms.com" which has arround 300 A record in DNS. Hence 
HTTPClient libraries while downloading the page (PageFetcher), it tries all the 
IP blaintly (No config option available) as PoolingClientConnectionManager uses 
DefaultClientConnectionOperator which has a for iteration to try all IPs. 

Again if exception raised, the httpclient tries 3 times. As default retries 
count is 3 in httpclient.

I couldnot find a solution, hence i modifed the crawler4j to use custom 
PoolingManager with modified ConnectionOperator.

After doing the above, i got rid of one issue that retring in all IP. 

But i learnt, that still another hidden issue to it is addSeed.

Because addSeed initialy tries download the robot.txt there itself it get 
blocked that due in main thread. So parallel crawling cannot be done if addSeed 
is initialy called from main thread.

Now solved the issue by having custom controler which accomodate all values 
injected via addSeed to local colleciton, and in onBefore (which get called by 
crawler threads) i used actual addSeed to load the data.

If this fix is done in crawler4j, it would be great. :) Just wanted to share my 
learnings

Original issue reported on code.google.com by jeba.ride@gmail.com on 15 Apr 2014 at 1:14

udayinfy / crawler4j

Crawler4j missing more control over retry count #261