Open rmusser01 opened 1 month ago
https://unstract.com/blog/comparing-approaches-for-using-llms-for-structured-data-extraction-from-pdfs/ https://unstract.com/blog/pdf-hell-and-practical-rag-applications/ https://neuml.github.io/txtai/usecases/#retrieval-augmented-generation https://github.com/Zipstack/unstract https://github.com/Filimoa/open-parse
As a user, I would like to be able to select / upload a PDF document, have the text content of the document extracted, chunked(if necessary), and then summarized appropriately. (And ingested into the DB with the option for adding keywords to the document)
PDF Tools https://github.com/VikParuchuri/surya https://github.com/nlmatics/llmsherpa https://github.com/Stirling-Tools/Stirling-PDF https://www.pdftool.org/en https://github.com/VikParuchuri/marker https://blog.dagworks.io/p/containerized-pdf-summarizer-with https://ai.gopubby.com/demystifying-pdf-parsing-02-pipeline-based-method-82619dbcbddf?gi=5de928644ec4 https://github.com/tesseract-ocr/tesseract https://github.com/UglyToad/PdfPig/wiki/Document-Layout-Analysis https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf/ https://github.com/apache/tika
http://mccormickml.com/2024/01/30/summarizing-long-pdfs-with-chatgpt/ https://github.com/nlmatics/nlm-ingestor https://github.com/Filimoa/open-parse - extract tables
https://archive.is/p0cLQ