khuongduyit / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Requests Per Second Per Host #69

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
As in crawler4j we have requests per second in the Politeness factor.. But if 
we want to have requests per second per each host for each seed url. Is there 
any way...

Original issue reported on code.google.com by jamalrai...@gmail.com on 12 Aug 2011 at 10:04

GoogleCodeExporter commented 9 years ago
No, the current implementation does not keep different URL queues for different 
hosts.

-Yasser

Original comment by ganjisaffar@gmail.com on 12 Aug 2011 at 11:30

GoogleCodeExporter commented 9 years ago

Original comment by ganjisaffar@gmail.com on 12 Aug 2011 at 11:30

GoogleCodeExporter commented 9 years ago
Any suggestions if I need to work on that.. How can I do that..??

Original comment by jamalrai...@gmail.com on 12 Aug 2011 at 11:32

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I have a frontier implementation that maintains separate queues per host. 
Attaching it. The only problem is this implementation creates a new working 
queue for every host encountered during a crawl. This should be alright for a 
small number of hosts. 

If the implementation looks good, and frontier is made configurable it could be 
used.

Attached are the changed files. Thread sleep for politeness from the page 
fetcher has been removed.

Original comment by omkarash...@gmail.com on 5 Jun 2012 at 8:59

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 18 Aug 2014 at 3:10

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 18 Aug 2014 at 3:12