Memory issues - Githubissues

Hi there,

I note that in the code you've put a TODO helpfully saying //TODO: make option to change this [buffer size]. Add F.A.Q entry for java.lang.OutOfMemoryError: Java heap space.

I've been trying to use PDBF perhaps inappropriately to make a nice, shiny html5 version of my [completed] PhD thesis. It's not that crazy, doesn't do anything mad, contains about 1e6 characters of TeX, and compiles to a 300 MiB pdf with pdflatex.

I can routinely bust not only the Java heap but the 32-bit int limit on array size if I just increase the heap by -Xmx16g -XX:+UseCompressedOops -XX:+DisableExplicitGC (or equivalent). The full trace is:

Compiling HTML...
Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
    at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:834)
    at java.lang.StringBuilder.replace(StringBuilder.java:262)
    at pdbf.misc.Tools.fixXref(Tools.java:246)
    at pdbf.compilers.HTML_PDF_Compiler.main(HTML_PDF_Compiler.java:112)
    at pdbf.PDBF_Compiler.main(PDBF_Compiler.java:167)

It would be really nice to be able to get around this somehow, but I recognise that there's a lot of work re-building the codebase to not read the whole pdf into memory at once. Do you have any ideas?

Thanks for a great project!

uds-datalab / PDBF

Memory issues #41