jungjonghun / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Huge throughput improvement #271

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Currently it is one thread doing fetch of a single page and while that fetch is 
in progress it is doing nothing. 

We could make it significantly more efficient, by moving to async IO where we 
submit a large number of requests and once any of them is ready we would 
process them. 

We would also need to have separate threads for submitting fetch requests, 
parsing html pages... 

This is a major architecture change.

Original issue reported on code.google.com by avrah...@gmail.com on 10 Aug 2014 at 12:06

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 18 Aug 2014 at 3:51

GoogleCodeExporter commented 9 years ago
For reference:
https://code.google.com/p/crawler4j/issues/detail?id=61

Original comment by avrah...@gmail.com on 28 Aug 2014 at 5:14