Open cgr71ii opened 2 years ago
What version of JPype is being used here?
Package JPype1
, version 1.3.0
Given that there is only one JPype call I suspect this is all on the Java side. Older versions of JPype have reference counting issues that can cause the memory foot print of Java to leak. I would repeat the experiment with pure Java to verify that the issue is indeed a pure Java problem.
Hi!
I've been using Boilerpipe with Bitextor, and everything has worked out fine. The problem is that when I processed a PDF file, specifically this one, I run out of memory and the execution failed. The error message I got is:
In order to get rid of Bitextor for the explanation of this issue, I attach to this issue the file which Bitextor generated from the PDF, which is an HTML, and the attached HTML is the one that causes this problem. The file size is 9.4 MB, which I don't know if is a size too big to make Boilerpipe fail. The problem is not related to the PDF itself, since I processed other PDFs and the process finished without errors.
In the end, I figured out that the problem was actually due to the memory (initially I though about a memory leak), what was really weird to me since it is a 9.4 MB file. I fixed the problem increasing the quantity of memory of
jpype
. The total quantity of memory which a 9.4 MB HTML file required was of ~52 GB!!!!!!! My system has 126 GB, so the default max. heap size of the JVM is 30 GB. Since the process was requiring 52 GB and the max. heap size was 30 GB, I was running out of memory.The reason of this issue is to alert other people which might have the same problem and to ask the following question: do these numbers make sense? I mean, 52 GB of memory for an HTML file of 9.4 MB?
The code which triggers the error:
The fix (run before the above code; it should work, but I haven't tested it out of the actual file, so I might have miss something):
html.tar.gz