jesbin / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Environment daemon threads keep running after CrawlController.shutdown() #260

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Add System.exit(0) to the end of your main method and add a breakpoint on it
2. Run you program in debug mode
3. After it reaches the breakpoint look at the running threads

What is the expected output? What do you see instead?
Threads "Checkpointer", "INCompressor" and "Cleaner-1" should not be there.

What version of the product are you using?
3.5

Please provide any additional information below.
I'm trying to create and run crawlers in a loop. I'm creating the CrawlConfig, 
PageFetcher, RobotstxtServer and CrawlController for each domain I want to 
crawl. Everything works fine, but with every new domain 3 more threads are left 
running. Since I have a large list of domains to crawl, at some point this will 
become a major issue. 

I think that com.sleepycat.je.Environment#close should be called when 
CrawlController finishes its work.

I'm attaching the thread dump after 3 domains were crawled, just before exiting 
the JVM.

Original issue reported on code.google.com by usho...@gmail.com on 9 Apr 2014 at 9:23

Attachments:

GoogleCodeExporter commented 8 years ago
how did you get a thread dump? I have also noticed the same issue. I also 
noticed that there is a memory leak somewhere in the sleepycat. Could it be 
caused by sleepycat not closing?

Original comment by jeger...@gmail.com on 2 Jun 2014 at 11:13

GoogleCodeExporter commented 8 years ago
use startNonBlocking instead of start and then use a while loop to check status

example
controller.startNonBlocking(CrawlProcess.class, numberOfCrawlers); 
while(true){
        if(controller.isFinished()){
            break;
        }
}

this way it only closes when it is done. I actually created a separate process 
to run crawlers in their own threads. then i close the threads when they are 
done.

example
StartThread->startNonBlocking->isFinished==true->closeCrawler->CloseThread

then repeat.

Original comment by jeger...@gmail.com on 4 Jun 2014 at 2:00

GoogleCodeExporter commented 8 years ago
I used the Intellij IDEA debugger to stop on the break point and then clicked 
on "thread dump"

Original comment by usho...@gmail.com on 4 Jun 2014 at 2:06

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
thank you! did you find a solution for your issue?

Original comment by jeger...@gmail.com on 4 Jun 2014 at 5:28

GoogleCodeExporter commented 8 years ago
Yes, I've checked out the author's source code and modified it. I'm attaching 
the patch.
Hope this helps.
Uri 

Original comment by usho...@gmail.com on 5 Jun 2014 at 8:48

Attachments:

GoogleCodeExporter commented 8 years ago
Well done!!! Thank you so much! Keep in touch.

Original comment by jeger...@gmail.com on 12 Jun 2014 at 6:39

GoogleCodeExporter commented 8 years ago
i think there is another leak within sleepycat. I eventually run out of memory 
when I run the crawlers for a few days. 

Original comment by jeger...@gmail.com on 17 Jun 2014 at 2:10

GoogleCodeExporter commented 8 years ago
this the heap dump analyzed by Eclipse Memory Analyzer.

1,790 instances of "com.sleepycat.je.tree.BIN", loaded by 
"java.net.URLClassLoader @ 0x8a842fa8" occupy 29,979,440 (57.61%) bytes. These 
instances are referenced from one instance of 
"java.util.concurrent.ConcurrentHashMap$Node[]", loaded by "<system class 
loader>"

Keywords
java.net.URLClassLoader @ 0x8a842fa8
com.sleepycat.je.tree.BIN
java.util.concurrent.ConcurrentHashMap$Node[]

the instance of com.sleepycat.je.tree.BIN continues to grow until jvm has run 
out of memory. Do you have any ideas what could cause this?

Original comment by jeger...@gmail.com on 17 Jun 2014 at 2:40

GoogleCodeExporter commented 8 years ago
Fixed.

Leak is closed.

Revision hash: 9efaeef20c30 

Original comment by avrah...@gmail.com on 11 Aug 2014 at 8:28