mayooear / gpt4-pdf-chatbot-langchain

GPT4 & LangChain Chatbot for large PDF docs
https://www.youtube.com/watch?v=ih9PBGVVOO4
14.73k stars 3k forks source link

Is it possible to exclude pre-trained knowledge? #425

Closed mowliv closed 5 months ago

mowliv commented 8 months ago

Is it possible for it to answer questions about the PDFs I feed in but nothing else? For my use case, if I asked it "what is a beetle?" I would wanted to say something like "I don't know". I want to be sure general knowledge isn't being mixed in.

(As a side note, the instructions in the README file say to ask questions on discord, but the discord channel doesn't seem to have anyone answering questions in the last several weeks.)

dosubot[bot] commented 8 months ago

🤖

Hello @mowliv,

I'm here to assist you with your questions about the gpt4-pdf-chatbot-langchain repository. I'm a bot designed to help solve bugs, answer questions, and guide you in becoming a contributor while you're waiting for a human maintainer.

Regarding your question about excluding pre-trained knowledge, yes, it is possible to configure the gpt4-pdf-chatbot-langchain to only answer questions based on the PDFs inputted. This is achieved by creating a vector store from an existing index using the PineconeStore.fromExistingIndex() function. The index is created from the PDFs inputted, and the chatbot uses this vector store to generate responses to your questions.

However, the OpenAIEmbeddings class is used in the code, which might include pre-trained embeddings from OpenAI. If you want to exclude any pre-trained general knowledge, you might need to modify this part of the code to use custom embeddings trained only on your PDFs. Unfortunately, I couldn't find specific instructions on how to do this within the repository.

As for the discord channel, I'm sorry to hear that you've had trouble getting responses there. I'm here to help you as best I can in the meantime.

If you have any more questions or need further clarification, feel free to ask!

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] commented 5 months ago

Hi, @mowliv,

I'm helping the gpt4-pdf-chatbot-langchain team manage their backlog and am marking this issue as stale. From what I understand, you want to exclude pre-trained knowledge from the model's responses when using it to answer questions about specific PDFs. Dosubot provided guidance on configuring the chatbot to achieve this and also acknowledged the issue with the unresponsive Discord channel.

Could you please confirm if this issue is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository? If it is, please let the gpt4-pdf-chatbot-langchain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

mowliv commented 5 months ago

Yes -- working well for me in a fork of the repo! 7000 PDF documents.

dosubot[bot] commented 5 months ago

🤖

Thank you, @mowliv, for closing the issue! Your contribution is much appreciated.


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.