rmusser01 / tldw

tl/dw (Too Long, Didn't Watch): Your Personal Research Multi-Tool - a naive attempt at 'A Young Lady's Illustrated Primer'
Apache License 2.0
330 stars 11 forks source link

Improvement: Add ability to ingest Office documents #44

Closed rmusser01 closed 4 months ago

rmusser01 commented 5 months ago

As a user, I would like to be able to select / upload a document, have the text content of the document extracted, chunked (if necessary), and then summarized appropriately.

Documents https://blog.streamlit.io/langchain-tutorial-3-build-a-text-summarization-app/ https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/use-cases/document-summarization/summarization_large_documents_langchain.ipynb

rmusser01 commented 5 months ago

https://pypi.org/project/sumy/

rmusser01 commented 4 months ago

Closing due to usage of marker.