xrma / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

WARN Could not remove: [page...] from list of processed pages. #180

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. config.setResumableCrawling(true);

What is the expected output? What do you see instead?
WARN  Could not remove: [page...] from list of processed pages.

What version of the product are you using?
Latest!

Please provide any additional information below.

Original issue reported on code.google.com by cf.wfwei@gmail.com on 25 Nov 2012 at 3:10

GoogleCodeExporter commented 9 years ago
Verion 3.3 

Original comment by cf.wfwei@gmail.com on 25 Nov 2012 at 7:24

GoogleCodeExporter commented 9 years ago
I have the same issue. Can anybody tell me whether this issue has negative 
consequences on the frontier exploring (i.e. the crawler will visit already 
visited pages?)

Thanks

Original comment by serxhiod...@gmail.com on 7 Jan 2013 at 12:43

GoogleCodeExporter commented 9 years ago
https://github.com/yasserg/crawler4j/commit/7b8bf91aab517757f4b62bd3ca22546e105a
736b

在src/main/java/edu/uci/ics/crawler4j/frontier/WorkQueues.java里面有一个��
�时解决方法

Original comment by cf.wfwei@gmail.com on 7 Jan 2013 at 1:16

Attachments:

GoogleCodeExporter commented 9 years ago
Thank you wfwei, i modified the last version (3.4.1-SNAPSHOT) of the source 
with your fix and it worked.

Original comment by serxhiod...@gmail.com on 7 Jan 2013 at 3:43

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 11 Aug 2014 at 2:16