oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
39.1k stars 5.16k forks source link

Document upload & querying (pdf,markdown,docx,excel etc) - backend & frontend #2440

Closed nidhoggr-nil closed 1 year ago

nidhoggr-nil commented 1 year ago

One of the very big things missing for text-gen I feel, is a clean interface and functionality for document upload and maybe even document manipulation in the long term, and the integration of that into the rest of the software.

I propose that the user must be able to:

On the technical side, I propose that:

I can see that there's some development on integrating langchain here, which might help this issue: Integrate with gpt-index/langchain #665

sandorkonya commented 1 year ago

+1 to this.

There are several attempts for "chat with multiple pdf", usually based on llamaindex or langchain, but i am still missing a front-end that helps with the pdf manipulation.

I feel that gradio will be limiting factor here, original pdf content can only be shown with hacks - found here.

nidhoggr-nil commented 1 year ago

I've looked at this approach (pdfGPT) as well, which might generate better answers.

I'm considering if I could integrate it into text-gen, then use something like pdf.js or pdf-lib to display the results, but I won't have time to try it until in a few weeks time. Then use chroma as the storage backend for the embeddings when time comes to implement persistence.

I will try to just embed pdf.js directly in a simple frame in gradio, and list the results, from a UI implementation perspective it seems simple, the tricky part is how easy the references are to integrate into the pdf.js to allow bi-directional linking and highlighting. That and pdf object creation on the fly for result highlighting. Might even do text selection in the pdf, to ask about a specific part, but that is for the future.

nidhoggr-nil commented 1 year ago

But I also need to figure out how much the impact is of using chroma is, because it's a whole database, I don't know if you can somehow package chroma so it doesn't become yet another service which needs to run. Otherwise I will try to look into sqlite, and see if I can adapt the format such that the dependencies are kept small, but I still could get the guarantees a database offers.

nidhoggr-nil commented 1 year ago

And I think it will be a good idea to become acquainted with the code for superbooga.

sandorkonya commented 1 year ago

@Osigmae take a look at this approach.

sandorkonya commented 1 year ago

@Osigmae this would be something i would love to see...: image

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

94bb494nd41f commented 1 year ago

any progress?

nidhoggr-nil commented 1 year ago

Hey, I'm still looking into it, been fiddling with langchain, chromadb and pdf scanner libraries and tesseract ocr for math, and got stuff working in the prototype stage, but my summer has been more busy than anticipated 😅 Can't promise anything timeline wise, so hopefully someone with more time than me makes a solution somewhere, otherwise I'm just going to work on this when I got sparetime for it here and there.

TomLucidor commented 8 months ago

Reading this through discussions on PrivateGPT or GPT4All or Khoj (or some other bolt-on solution) https://github.com/oobabooga/text-generation-webui/discussions/1372

Gee1111 commented 7 months ago

anything new?