xrma / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

inproper operation when there is more then one crawl thread (threadsafety problem) #103

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. run a crawl with multiple crawler threads
2.
3.

What is the expected output? What do you see instead?

What version of the product are you using?
3

Please provide any additional information below.

the class edu.uci.ics.crawler4j.fetcher.PageFetcher
is used as if it is thread safe (it has only one instance) but it is not
thread safe. there at least two instance variable (entity,fetchedUrl) which 
breaks the thread safety

Original issue reported on code.google.com by ohad...@gmail.com on 12 Jan 2012 at 12:54

GoogleCodeExporter commented 9 years ago
Thanks for pointing out this. I had refactored the PageFetcher and had missed 
that. I fixed this in this change: 
http://code.google.com/p/crawler4j/source/detail?r=b0fd2cbed00843eca3ee52b498a95
f13b6f67ac1

I will release the new version with this fix soon.

-Yasser

Original comment by ganjisaffar@gmail.com on 13 Jan 2012 at 7:16