What steps will reproduce the problem?
1. Feeding the crawler a list of websites to crawl.
2. running the crawling operation in a while loop
What is the expected output? What do you see instead?
It's not about the output, it's more about heap consumption, when running
several consecutive crawling operations the expected behavior is that heap
usage doesn't really increase over time, After a crawling is over all resources
should be released, unless there is some kind of memory leak ?
I'm planning to have a crawler that just keeps on running non-stop and crawls
thousands of websites starting from one, but doing so isn't possible with the
current implementation of crawler4j, seeing how heap usage keeps on increasing
until the application crashes.
As you can see instances of Byte objects after like 30 minutes running is 1,500
Mega bytes.
Right now the entire heap is like 2,300 MB, in some time it'll reach 3,000 and
crash.
Any idea what could be causing this behavior ?
What version of the product are you using?
Please provide any additional information below.
Original issue reported on code.google.com by feuoo...@gmail.com on 10 Feb 2015 at 7:54
Original issue reported on code.google.com by
feuoo...@gmail.com
on 10 Feb 2015 at 7:54Attachments: