Open GoogleCodeExporter opened 9 years ago
I think it should be enough changing where Crawler4j stores the queue, offering
an atomic access to the stored queue.
By now it stores the queue to the local disk. If the queue is stored on a
remote folder, the urls contained in the queue will be partitioned among the
several Crawler4j instances.
Original comment by giovanni...@gmail.com
on 7 Oct 2013 at 4:56
Original comment by avrah...@gmail.com
on 18 Aug 2014 at 3:08
Original comment by avrah...@gmail.com
on 18 Aug 2014 at 3:10
Is anybody interested in pursuing this with me?
Original comment by r.hamn...@gmail.com
on 27 Aug 2014 at 10:19
I am not there yet.
I have a lot on my table yet to make crawler4j more stable before optimizing it.
But please go ahead, I am willing and able to help as much as I can/know.
Please not that I have planned on this:
https://code.google.com/p/crawler4j/issues/detail?id=271
With the original author (Yasser) which will change the architecture quite a
bit and is supposed to give the crawler a huge boost
Original comment by avrah...@gmail.com
on 28 Aug 2014 at 5:13
Original issue reported on code.google.com by
nishant....@gmail.com
on 26 Jul 2011 at 5:44