h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/
http://h2o.ai
Apache License 2.0
10.94k stars 1.2k forks source link

OCR issue #1595

Open InesBenAmor99 opened 1 month ago

InesBenAmor99 commented 1 month ago

When I import a scanned PDF, I encounter an error ( first screenshot ) , and the OCR option is not available for me in the expert tab. Even if I activate pdf options from auto to on I still encounter another error (as shown in the second screenshot). What could be the problem? image

image

pseudotensor commented 1 month ago

I recommend DocTR model instead of scanned/OCR handling. Unstructured is vastly slower for OCR and less accurate.

I can't tell much from the 2nd error you provided as it's only single line. I can't tell where in code that is from etc.