Open hollisticated-horse opened 3 years ago
seems to use quite a bit of memory... could it not be stored in a tmp file on the go, to avoid hogging the memory ? went from 5 to 8+ Go on a 1000 page pdf.
it finished loading, I was able to save the hocr output in a .html file.
But now it doesn't want to load it anymore.
Running gimagereader-gtk --gtk-debug=FLAGS
dumps this when i try to open the .html file generated:
Bytes: 0xE2 0x80 0x26 0x71
fsize 10; x_wconf 76" class="ocrx_word" id="word_34_119" lang="eng">“inherited
^
Entity: line 20949: parser error : EntityRef: expecting ';'
x_fsize 9; x_wconf 35" class="ocrx_word" id="word_35_244" lang="eng">(�&�&�
^
Entity: line 962995: parser error : EntityRef: expecting ';'
x_fsize 9; x_wconf 56" class="ocrx_word" id="word_824_393" lang="eng">�&�&�
^
(gimagereader-gtk:277463): glibmm-ERROR **: 19:08:03.778:
unhandled exception (type std::exception) in signal handler:
what:
Validity error:
Line 1028405, column 131 (error):
xmlSAX2Characters: huge text node
Trace/breakpoint trap (core dumped)
Is the texte so huge it can't handle it or is there an encoding error ?
Got an official crash when exporting to ODT: backtrace_gimage.txt backtrace_gimage 2 .txt
then when importing original .xml file generated from gimage:
import_xml_backtrack_gimage.txt
what to do ?
Have i found the limit ?
Hi, new ticket, different issue: it seems that with big pdfs, the software hangs or has a hard time staying stable ? The "force to quit or wait dialogue" comes and goes... Since i don't have the technical skills to actually seek and correct this, can i in anyway help diagnose or provide info to debug ?
Edit: the original file is a 144~Mo .pdf same size for the html generated 1097 pages, full texte, and images...