nosia-ai / nosia

Nosia is a platform that allows you to run an AI model on your own data. It is designed to be easy to install and use.
https://guides.nosia.ai
MIT License
54 stars 2 forks source link

Improve PDF parsing #21

Open cbldev opened 5 months ago

cbldev commented 5 months ago

Actual behavior

I noticed that on some complex PDF, with tables, pdftotext produce better result than pdf-reader gem.

pdftotext: https://www.xpdfreader.com/pdftotext-man.html

Issue in Langchainrb: https://github.com/patterns-ai-core/langchainrb/issues/682

Expected behavior

Good results on complex PDF parsing.

cbldev commented 1 week ago

Docling seems to be a better option