mayooear / gpt4-pdf-chatbot-langchain

GPT4 & LangChain Chatbot for large PDF docs
https://www.youtube.com/watch?v=ih9PBGVVOO4
14.74k stars 3k forks source link

Does it support a variety of multilingual documents and PDFs? #369

Closed YIN-Renlong closed 4 months ago

YIN-Renlong commented 12 months ago

First of all, I want to express my gratitude for this project.

Are there any special processes or document processing improvements that need to be implemented for the mixed use of multilingual documents or PDFs? Or does it intend to pose a query in a certain language?

Cite this additional YouTube comment:

It works SO GOOOD with english pdfs, but when i try non-english it struggles a lot, any idea why?

OpenAI's embedding function isn't optimized for multilingual. You can try Cohere's embedding api instead.

Exists a method of integration?

Thanks.

dosubot[bot] commented 8 months ago

Hi, @YIN-Renlong! I'm Dosu, and I'm here to help the gpt4-pdf-chatbot-langchain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you opened this issue to discuss the support for multilingual documents and PDFs in the project. You were seeking guidance on any specific requirements or enhancements for handling multilingual documents and inquiring about the possibility of integrating with Cohere's embedding API for improved multilingual support.

However, there haven't been any comments or updates on the issue since you opened it.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution to the project!

anjopater commented 8 months ago

@YIN-Renlong I did some tests using a document in another language PT-BR and it worked well for me.

dosubot[bot] commented 8 months ago

@mayooear Could you please help @YIN-Renlong with the issue marked as stale? They have indicated that the issue is still relevant and provided an update on their tests using a document in another language (PT-BR). Thank you!

dosubot[bot] commented 5 months ago

Hi, @YIN-Renlong

I'm helping the gpt4-pdf-chatbot-langchain team manage their backlog and am marking this issue as stale.

It looks like you inquired about multilingual document support and the possibility of integrating with Cohere's embedding API for improved multilingual support. Anjopater confirmed successful tests using a document in another language (PT-BR), and I prompted for an update from you to determine the relevance of the issue.

Could you please confirm if this issue is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository? If it is, please let the gpt4-pdf-chatbot-langchain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.