Closed nidhoggr-nil closed 1 year ago
+1 to this.
There are several attempts for "chat with multiple pdf", usually based on llamaindex or langchain, but i am still missing a front-end that helps with the pdf manipulation.
I feel that gradio will be limiting factor here, original pdf content can only be shown with hacks - found here.
I've looked at this approach (pdfGPT) as well, which might generate better answers.
I'm considering if I could integrate it into text-gen, then use something like pdf.js or pdf-lib to display the results, but I won't have time to try it until in a few weeks time. Then use chroma as the storage backend for the embeddings when time comes to implement persistence.
I will try to just embed pdf.js directly in a simple frame in gradio, and list the results, from a UI implementation perspective it seems simple, the tricky part is how easy the references are to integrate into the pdf.js to allow bi-directional linking and highlighting. That and pdf object creation on the fly for result highlighting. Might even do text selection in the pdf, to ask about a specific part, but that is for the future.
But I also need to figure out how much the impact is of using chroma is, because it's a whole database, I don't know if you can somehow package chroma so it doesn't become yet another service which needs to run. Otherwise I will try to look into sqlite, and see if I can adapt the format such that the dependencies are kept small, but I still could get the guarantees a database offers.
And I think it will be a good idea to become acquainted with the code for superbooga.
@Osigmae take a look at this approach.
@Osigmae this would be something i would love to see...:
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.
any progress?
Hey, I'm still looking into it, been fiddling with langchain, chromadb and pdf scanner libraries and tesseract ocr for math, and got stuff working in the prototype stage, but my summer has been more busy than anticipated 😅 Can't promise anything timeline wise, so hopefully someone with more time than me makes a solution somewhere, otherwise I'm just going to work on this when I got sparetime for it here and there.
Reading this through discussions on PrivateGPT or GPT4All or Khoj (or some other bolt-on solution) https://github.com/oobabooga/text-generation-webui/discussions/1372
anything new?
One of the very big things missing for text-gen I feel, is a clean interface and functionality for document upload and maybe even document manipulation in the long term, and the integration of that into the rest of the software.
I propose that the user must be able to:
On the technical side, I propose that:
I can see that there's some development on integrating langchain here, which might help this issue: Integrate with gpt-index/langchain #665