xrma / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Missing IF-Statement causes crawler to throw a NullPointerException while syncing #175

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Start a crawling job and wait until it is finished this NullPointerException 
will occur:

It looks like no thread is working, waiting for 10 seconds to make sure...
No thread is working and no more URLs are in queue waiting for another 10 
seconds to make sure...
All of the crawlers are stopped. Finishing the process...
Waiting for 10 seconds before final clean up...
java.lang.NullPointerException
    at com.sleepycat.je.Database.trace(Database.java:1816)
    at com.sleepycat.je.Database.sync(Database.java:489)
    at edu.uci.ics.crawler4j.frontier.WorkQueues.sync(WorkQueues.java:187)
    at edu.uci.ics.crawler4j.frontier.Frontier.sync(Frontier.java:182)
    at edu.uci.ics.crawler4j.frontier.Frontier.close(Frontier.java:192)
    at edu.uci.ics.crawler4j.crawler.CrawlController$1.run(CrawlController.java:232)
    at java.lang.Thread.run(Thread.java:722)

The problem is that in line 489 in Database.java all but the first two 
parameters of the trace method are null. In trace() itself there should be a 
check for each parameter whether it is null or not. 
But the one for the "key" parameter is missing.

Original issue reported on code.google.com by robert.g...@gmail.com on 9 Oct 2012 at 1:33

GoogleCodeExporter commented 9 years ago
Guess it can be closed here. Just saw that this file is part of the sleepycat 
distribution.

Original comment by robert.g...@gmail.com on 9 Oct 2012 at 1:35

GoogleCodeExporter commented 9 years ago
I have same problem....

java.lang.NullPointerException
        at com.sleepycat.je.Database.trace(Database.java:1816)
        at com.sleepycat.je.Database.sync(Database.java:489)
        at edu.uci.ics.crawler4j.frontier.WorkQueues.sync(WorkQueues.java:189)
        at edu.uci.ics.crawler4j.frontier.Frontier.sync(Frontier.java:183)
        at edu.uci.ics.crawler4j.frontier.Frontier.close(Frontier.java:193)
        at edu.uci.ics.crawler4j.crawler.CrawlController$1.run(CrawlController.j
ava:232)
        at java.lang.Thread.run(Unknown Source)

Original comment by tru3....@gmail.com on 16 Jul 2014 at 6:17

GoogleCodeExporter commented 9 years ago
Fixed, works in v3.5

Original comment by avrah...@gmail.com on 11 Aug 2014 at 2:12

GoogleCodeExporter commented 9 years ago
I am suffering the same issue. I am running v3.5 so any help would be 
appreciated.

Original comment by jack.w.w...@gmail.com on 18 Oct 2014 at 11:30

GoogleCodeExporter commented 9 years ago
I am reopening this issue.

But I need a clear scenario.

Jack, can you provide me with a scenario causing this exception to be thrown ?

Original comment by avrah...@gmail.com on 20 Oct 2014 at 9:39

GoogleCodeExporter commented 9 years ago
I run several crawls concurrently, and each instance of CrawlController 
eventually throws this exception causing the thread to crash.

Original comment by jack.w.w...@gmail.com on 6 Dec 2014 at 1:49

GoogleCodeExporter commented 9 years ago
Can you please check it against v4.0 of crawler4j ?

This is a new release and many many fixes have been implemented.
If this problem appears also in v4.0 then I can check it out.

Original comment by avrah...@gmail.com on 7 Dec 2014 at 12:10

GoogleCodeExporter commented 9 years ago
I am currently running off the latest source code in the repo, and now it 
happens less often, but still seems to occur.

Original comment by jack.w.w...@gmail.com on 8 Dec 2014 at 1:55

GoogleCodeExporter commented 9 years ago
Can you post the complete stacktrace please ?

Original comment by avrah...@gmail.com on 8 Dec 2014 at 1:56

GoogleCodeExporter commented 9 years ago
java.lang.NullPointerException
        at com.sleepycat.je.Database.trace(Database.java:1816)
        at com.sleepycat.je.Database.sync(Database.java:489)
        at edu.uci.ics.crawler4j.frontier.WorkQueues.sync(WorkQueues.java:189)
        at edu.uci.ics.crawler4j.frontier.Frontier.sync(Frontier.java:183)
        at edu.uci.ics.crawler4j.frontier.Frontier.close(Frontier.java:193)
        at edu.uci.ics.crawler4j.crawler.CrawlController$1.run(CrawlController.j
ava:232)
        at java.lang.Thread.run(Unknown Source)

Original comment by jack.w.w...@gmail.com on 8 Dec 2014 at 2:03

GoogleCodeExporter commented 9 years ago
Fixed at Rev: e7ba2db4c596  

It is an internal bug in BerkleyDB's code.

No update has come from BerkleyDB for over a year.

After investigation i have removed several of our calls to their problematic 
code, as those calls were redundant anyway.

More details can be found in the commit description.

Original comment by avrah...@gmail.com on 9 Dec 2014 at 1:29