Open Yuxuan1998 opened 2 weeks ago
Any update on this? am also getting the same issue. I have both Poppler and Tesseract installed in my windowspc
@Yuxuan1998 try this. It resolved my issues. My tesseract was installed here. The global variable in pytesseract was set to tesseract. You can view it if you open pytesseract.py file under unstructured_pytesseract of your .env folder.
import unstructured_pytesseract
unstructured_pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
Description
I'm trying to use UnstructuredPDFLoader to load pdf but encounter errors as mentioned above.
PDFInfoNotInstalledError
which pdfinfo
it returns me the correct path/opt/homebrew/Cellar/poppler/24.04.0_1/bin/pdfinfo
✅poppler --version
, I getzsh: command not found: poppler
, and this happends to my other laptops as well❓poppler_path
from None to the path:poppler_path: Union[str, PurePath] = "/opt/homebrew/Cellar/poppler/24.04.0_1/bin/"
(in ./.venv/lib/python3.11/site-packages/pdf2image/pdf2image.py)TesseractNotFoundError
which tesseract
it returns me the correct path/opt/homebrew/bin/tesseract
✅tesseract --version
it returns me the correct verssion✅tesseract_cmd
from'tesseract
to the path:tesseract_cmd = '/opt/homebrew/Cellar/tesseract/5.4.1/bin/tesseract'
(in ./.venv/lib/python3.11/site-packages/unstructured_pytesseract/pytesseract.py)System Info
Package Information
platform mac
Python 3.11.3