mayooear / gpt4-pdf-chatbot-langchain

GPT4 & LangChain Chatbot for large PDF docs
https://www.youtube.com/watch?v=ih9PBGVVOO4
14.95k stars 3.02k forks source link

meta indexing #104

Closed text2sql closed 1 year ago

text2sql commented 1 year ago

@mayooear how do you enable not sure how to call it, meta indexing? when chatbot understands which document is which and can compare between them, the one you made in the video about tsla 2020 vs 2021 or smth like that thank you!

jagobagascon commented 1 year ago

I'm having a similar problem. I'm ingesting several files and when I ask a question that involves data from several (or all) files then the chat bot is not capable of giving a good answer.

Warning I still don't fully understand how langchain works, so I may say something that is not correct.

Let me know if I'm mistaken, but this is what I think it's happening: first we use the given question to find the relevant document pieces in the vector store, then we use that as context to send the question to the OpenAI api. If the question is too vague, or if it involves lot's of documents, then the given context is not enough for the chatbot to respond.

My case involves several documents with information about some government subsidies. If I ask the chatbot to list all the available subsidies then it just gives me a couple of them or tells me that the given context is not enough.

I think that the 2020 vs 2021 comparison works fine because the answer it is part of the input PDF document: Screenshot 2023-04-05 at 18 31 33

dosubot[bot] commented 1 year ago

Hi, @text2sql! I'm Dosu, and I'm helping the gpt4-pdf-chatbot-langchain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you are looking for guidance on enabling meta indexing in the text2sql chatbot. This feature would allow the chatbot to understand and compare different documents. User "jagobagascon" has commented on the issue, mentioning that the chatbot is not capable of giving a good answer when asked questions involving data from multiple files. They also suggest that the given context may not be enough for the chatbot to respond in certain cases.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding!