jesbin / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Memory usage #337

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Feeding the crawler a list of websites to crawl.
2. running the crawling operation in a while loop

What is the expected output? What do you see instead?

It's not about the output, it's more about heap consumption, when running 
several consecutive crawling operations the expected behavior is that heap 
usage doesn't really increase over time, After a crawling is over all resources 
should be released, unless there is some kind of memory leak ?
I'm planning to have a crawler that just keeps on running non-stop and crawls 
thousands of websites starting from one, but doing so isn't possible with the 
current implementation of crawler4j, seeing how heap usage keeps on increasing 
until the application crashes. 
As you can see instances of Byte objects after like 30 minutes running is 1,500 
Mega bytes.
Right now the entire heap is like 2,300 MB, in some time it'll reach 3,000 and 
crash.

Any idea what could be causing this behavior ? 

What version of the product are you using?

Please provide any additional information below.

Original issue reported on code.google.com by feuoo...@gmail.com on 10 Feb 2015 at 7:54

Attachments:

GoogleCodeExporter commented 8 years ago
Have anyone ever found a solution for this? Running against the same problems

Original comment by olafw...@gmail.com on 28 Jul 2015 at 7:47