yineza7 / Summarization-of-a-stack-of-papers-using-LLMs-

Summarizing a stack of papers involves systematically reviewing and condensing the key information from each paper to provide a concise overview It might be feasible to have two stages to come out the summarization of a stack of papers.
MIT License
0 stars 0 forks source link

Converting pdf to text #8

Closed yineza7 closed 5 months ago

yineza7 commented 6 months ago

T5 so far looks like a text to text model, however our objective is to get document to text. If you could find a model/write script to extract text from documents (try pdf versions only)

ColinThomas1 commented 6 months ago

https://medium.com/nerd-for-tech/convert-pdf-to-csv-using-python-b94fbef82155

Obasjoe commented 6 months ago

https://stackoverflow.com/questions/42093548/splitting-pdf-files-into-paragraphs

https://www.geeksforgeeks.org/extract-text-from-pdf-file-using-python/

yineza7 commented 5 months ago

@ColinThomas1 did you end up starting scripting this?

yineza7 commented 5 months ago

PDFtoText.py