Closed vivadavid closed 1 year ago
Sorry for the late reply.
See https://github.com/tesseract-ocr/tesseract/issues/1662, if your tesseract ist build with openmp support, this is likely the reason. You should either rebuild tesseract with openmp support disabled (as is upstream default and recommendation), or set the export OMP_THREAD_LIMIT=1
environment variable before launching gImageReader, ffor example on Linux with gimagereader-qt5:
$ export OMP_THREAD_LIMIT=1 gimagereader-qt5
Sorry for the late reply.
See tesseract-ocr/tesseract#1662, if your tesseract ist build with openmp support, this is likely the reason. You should either rebuild tesseract with openmp support disabled (as is upstream default and recommendation), or set the
export OMP_THREAD_LIMIT=1
environment variable before launching gImageReader, ffor example on Linux with gimagereader-qt5:$ export OMP_THREAD_LIMIT=1 gimagereader-qt5
Hi, thanks for your reply! It looks a bit complicated, and the language packages I use for Tesseract are downladed through gImageReader anyway. Is it something that could be fixed or adjusted in a future release of your programme?
If you are using the latest 3.4.1 Windows build, the bundled tesseract is compiled without OpenMP support, so it should not suffer from the performance penalty.
If you are using the latest 3.4.1 Windows build, the bundled tesseract is compiled without OpenMP support, so it should not suffer from the performance penalty.
I've just tried version 3.4.1 and, from 28-29 minutes, this time the OCR process took around 5 minutes 30 seconds, so that's great! Thanks!
I suppose I should open a new thread, but as I described in my first message, I keep getting an error message whenever I want to import JP2 images. Isn't this format supported?
Looks like a crash in the Jasper JP2 library - can you share the image which triggers this?
Looks like a crash in the Jasper JP2 library - can you share the image which triggers this?
There you go:
Hi, @manisandro , just a quick message to let you know that I've just tried the OCR tool in PDF24 and my JP2 images weren't supported either. It must be a general issue.
I see that there is an assertion error in the jasper jp2 image library which triggers the crash. I Haven't had the time to debug it further though.
@manisandro A small tip. On Unix-like systems, you can do the OMP_THREAD_LIMIT
workaround right from the executable via setenv
followed byexecvp
somewhere at the beginning of main()
(example).
I could also limit the number of threads via openmp API, but I'd rather not, as there are other parts in gimagereader which truely benefit from parallelism, so the proper solution really is to ensure that tesseract is build properly.
Hi!
I've performed OCR on a book consisting on 354 PNG images (the originals were in JP2, but I converted them because the programme crashed every time). This is the source:
https://archive.org/details/19261928Liberacin
My settings:
It took around 28-29 minutes.
I did the same thing with VietOCR and it took less that 8 minutes.
I wanted to report it in case I did something wrong or in case there is a bug.
Thank you for your time! I love your programme!