madmaze / pytesseract

A Python wrapper for Google Tesseract
Apache License 2.0
5.84k stars 721 forks source link

pytesseract's openMP runtime conflicts with CLIP #499

Closed sszzz830 closed 1 year ago

sszzz830 commented 1 year ago

Environment: macOS Ventura 13.2.1 Python Version: 3.9.12 (main, Apr 5 2022, 01:53:17) [Clang 12.0.0 ] CLIP Version: 1.0 pytesseract Version: 0.3.10 (CLIP is from https://github.com/openai/CLIP.git)

Problem description: After import CLIP and pytesseract in one python program and run , it gives out error message like this: _'OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIBOK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. Terminated due to signal: ABORT TRAP (6)' Then, if I set KMP_DUPLICATE_LIBOK=TRUE and rerun it, 'Terminated due to signal: SEGMENTATION FAULT (11)'_ happened when loading CLIP model.

Temporarily solution: Isolate the pytesseract part to an independent program, and use pipe to communicate with main program.

The code(demo) and error messages are as follow: (For some reason, I am temporarily unable to provide the code for the program that is under development and encountering errors, so I am using these to demonstrate.)

Screenshot 2023-08-20 at 22 29 26 Screenshot 2023-08-20 at 22 29 41
stefan6419846 commented 1 year ago

pytesseract does not ship any binary code itself and thus this is related to its dependencies. I suspect that this is a conflict between the Tesseract tool itself (wich is called by pytesseract in a subprocess); if you can verify this in a clean environment by just using subprocess.run(['tesseract', '--list-langs']) or some similar calls without having pytesseract installed, please seek help in the official Tesseract user forum/group. This seems to be out of scope for pytesseract.

sszzz830 commented 1 year ago

Thank you very much for addressing my questions. I would like to provide some additional context regarding an issue I've encountered. Here's the situation:

If I use the subprocess module to call Tesseract binaries directly,it could work without OpenMP error.

No OMP error if tesseract is called directly

By the way, it's strange to observe the behavior of the error message depending on the import order. Specifically, if pytesseract is imported before CLIP, the error message appears when importing CLIP. Conversely, if CLIP is imported first, importing pytesseract doesn't instantly produce an error message, but a segmentation fault occurs when loading the CLIP model. I'm aware that pytesseract doesn't directly provide any binary files but merely calls Tesseract. However, the problem seems to manifest specifically with pytesseract.

I want to express my gratitude once again for your response, and I apologize for repeatedly asking for clarification on this matter. Your insights have been invaluable in helping me understand the situation.

sszzz830 commented 1 year ago

I found that the error occurs when pytesseract tries to import numpy. I isolated the 'import xxx' part of pytesseract and put it in a single program. Then it turns out that the conflict libs are numpy and clip. However in another program iI could import the both libraries and use without conflict. Confusing.

(Picture 1 and 3 have OpenMP conflict but 2 doesn't)

Screenshot 2023-08-21 at 09 20 10 Screenshot 2023-08-21 at 09 07 20 Screenshot 2023-08-21 at 09 06 17
stefan6419846 commented 1 year ago

What happens if you import clip before numpy? I suspect that in your second example, torch is actually loading OpenMP as well.

Nevertheless, I still think that this is no real issue of pytesseract, but in the way numpy and clip interact in these cases and how their binary dependencies are resolved in each case - pytesseract still just imports this third-party modules here.

sszzz830 commented 1 year ago

I tried import clip before numpy, and the OpenMP error disappeared. Then I import pytesseract after clip and numpy, there is no more errors(only a OpenMP warning-'OMP: Warning #191: Forking a process while a parallel region is active is potentially unsafe.'). Thank you very much for solving my problem.=)

stefan6419846 commented 1 year ago

As this is solved, feel free to close this issue.