mayooear / gpt4-pdf-chatbot-langchain

GPT4 & LangChain Chatbot for large PDF docs
https://www.youtube.com/watch?v=ih9PBGVVOO4
14.73k stars 3k forks source link

image in pdf #417

Closed flckv closed 4 months ago

flckv commented 8 months ago

thanks for sharing the code. How did you handle images when uploading to the vector database, and when retrieving text that explains an image / graph / figure?

dosubot[bot] commented 8 months ago

🤖

Hello,

Thank you for your question. The current implementation of the gpt4-pdf-chatbot-langchain codebase primarily focuses on text extraction from PDF documents. It uses the PDFLoader class from the langchain/document_loaders/fs/pdf module to load and extract text from PDF files.

As for handling images, graphs, or figures within the PDF documents, the current version of the codebase does not explicitly handle or process these elements. The PDFLoader class is designed to extract text content and does not have built-in functionality for image processing or extraction.

If you're interested in extracting text that explains an image, graph, or figure, you would need to ensure that this text is not part of the image itself but is included as separate, selectable text within the PDF. If the text is part of the image, it would not be extracted with the current implementation.

If you're looking to extend the functionality to handle images, you might consider integrating an OCR (Optical Character Recognition) library to extract text from images within the PDF. This would involve significant modifications to the PDFLoader class or the creation of a new loader class to handle this functionality.

I hope this answers your question. If you have further queries or need clarification on any point, feel free to ask.

Best regards.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] commented 5 months ago

Hi, @flckv

I'm helping the gpt4-pdf-chatbot-langchain team manage their backlog and am marking this issue as stale. The issue you opened pertains to managing images within a PDF, particularly in the context of uploading to a vector database and retrieving explanatory text for images, graphs, or figures. I provided a detailed response, explaining that the current implementation of the gpt4-pdf-chatbot-langchain codebase primarily focuses on text extraction from PDF documents and does not handle images, graphs, or figures explicitly. I suggested integrating an OCR library to extract text from images within the PDF, which would involve significant modifications to the existing codebase.

Could you please confirm if this issue is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository? If it is, please let the gpt4-pdf-chatbot-langchain team know by commenting on the issue. Otherwise, feel free to close the issue yourself or the issue will be automatically closed in 7 days.

Thank you!