Open akash97715 opened 8 months ago
🤖
Hey there @akash97715! Hope you're doing well. It's good to see you again, let's dive into this issue together.
The error you're encountering seems to be originating from the extract_from_images_with_rapidocr
function in the pdf.py
file. This function attempts to import the rapidocr_onnxruntime
package and raises an ImportError
if it fails. The rapidocr_onnxruntime
package is used to extract text from images when extract_images
is set to True
.
Even though the GitHub action runner logs show that the rapidocr_onnxruntime
package is installed, the error could be due to several reasons:
The package might not be installed in the same Python environment that the application is running in. This can happen if there are multiple Python environments on the system.
The package might not be installed correctly. This can happen if there were errors during the installation process.
The package might not be compatible with the system. This can happen if the package requires specific system libraries that are not installed or if the package is not compatible with the version of Python that the application is using.
To resolve this issue, you can try the following:
Ensure that the rapidocr_onnxruntime
package is installed in the same Python environment that the application is running in. You can do this by running pip show rapidocr-onnxruntime
in the same environment.
Try reinstalling the rapidocr-onnxruntime
package. You can do this by running pip uninstall rapidocr-onnxruntime
followed by pip install rapidocr-onnxruntime
.
Check the compatibility of the rapidocr-onnxruntime
package with your system and Python version. You can do this by checking the package documentation or the package page on PyPI.
If none of these solutions work, you might need to look into other potential issues with your system or application configuration.
Also, it would be helpful if you could provide the exact versions of the rapidocr-onnxruntime
and pypdf
packages that are being used in the Docker production environment. This information could help in diagnosing the issue.
You can find more details about the extract_from_images_with_rapidocr
function in the pdf.py
file here.
I hope this helps! If you have any other questions or need further clarification, feel free to ask.
-Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
I am facing the same issue, @akash97715 were you able to resolve it??
>>> from rapidocr_onnxruntime import RapidOCR
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/rapidocr_onnxruntime/__init__.py", line 4, in <module>
from .main import RapidOCR
File "/usr/local/lib/python3.9/site-packages/rapidocr_onnxruntime/main.py", line 8, in <module>
import cv2
File "/usr/local/lib/python3.9/site-packages/cv2/__init__.py", line 181, in <module>
bootstrap()
File "/usr/local/lib/python3.9/site-packages/cv2/__init__.py", line 153, in bootstrap
native_module = importlib.import_module("cv2")
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
I found OpenCV is trying to use OpenGL (libGL.so.1), but it's failing to find the necessary OpenGL library.
But I am unable to get OpenGL library on my docker container.
I am facing the same issue, only error in docker , in my mac is correct:
and I'm sure docker has installed rapidocr-onnxruntime package
self.loader = PyMuPDFLoader(self.file_path, extract_images=True)
024-03-15 13:43:42,811 - root - ERROR - rapidocr-onnxruntime
package not found, please install it with pip install rapidocr-onnxruntime
please help me resolve it
I am facing the same issue, only error in docker , in my mac is correct:
and I'm sure docker has installed rapidocr-onnxruntime package
self.loader = PyMuPDFLoader(self.file_path, extract_images=True) 024-03-15 13:43:42,811 - root - ERROR -
rapidocr-onnxruntime
package not found, please install it withpip install rapidocr-onnxruntime
please help me resolve it
centos6 docker ," sudo yum install mesa-libGL" can resolve it
same issue
in linux OS add this to you dockerfile.
RUN apt-get update && apt-get install -y --no-install-recommends \
libgl1-mesa-glx \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
in linux OS add this to you dockerfile.
RUN apt-get update && apt-get install -y --no-install-recommends \ libgl1-mesa-glx \ libglib2.0-0 \ && rm -rf /var/lib/apt/lists/*
@bhargav-xponai 's solution is working perfectly for me. Thanks to save my day.
Issue you'd like to raise.
Hello Team,
we are trying to use pypdf to get the text out from the pdf use the chunk for embedding(details are there in attached code snippet. while using, i have installed all the required packages. its working fine in my local(windows 10). same code snippet and requirement.txt if i use on docker that uses ubuntu OS(its in prod), i am getting below error:
Error while chunking the file: Error while chunking the file, Errored while loading the document:
rapidocr-onnxruntime
package not found, please install it with `pip install rapidocr-onnxruntime`The strage part here is this rapidocr-onnxruntime package is already installed on the ubuntu system(i re-verified by seeing the github action runner logs where it installs all the packages from requirement.txt)
Not able to understand why on prod it pypdf package with extract_image=True is throwing above error
It will be helpful, if you can provide any insight or workaround to this issue.
Suggestion:
No response