zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
https://privategpt.dev
Apache License 2.0
54.11k stars 7.28k forks source link

Language support #195

Closed PierreVannier closed 1 year ago

PierreVannier commented 1 year ago

Hello there

I'd like to run / ingest this project with french documents. It seems to me the models suggested aren't working with anything but english documents, am I right ? Anyone's got suggestions about how to run it with documents written not in english ? I assume one must download a GPT4ALL compatible model. Where to find these ? Any available for french ?

Thanks for the clue.

P.S. It seems a frequent question but without any probant suggestions / clues

PierreMory commented 1 year ago

Hello Pierre,

You can use https://github.com/bofenghuang/vigogne which is a french LLM compatible with Llama-CPP

diamondbarcode commented 1 year ago

I test the document with the Khmer language " Cambodia country " written ins unicode .. and not work out well I got an invalid token error. I hope it works. In chatGPT itself, I can use the language fine.

Traceback (most recent call last): File "A:\vscodes\privateGPT\ingest.py", line 62, in main() File "A:\vscodes\privateGPT\ingest.py", line 56, in main

PierreVannier commented 1 year ago

Hello Pierre,

You can use https://github.com/bofenghuang/vigogne which is a french LLM compatible with Llama-CPP Hey Pierre, Thanks for the heads up ! How to do that exactly ? I can't find vigogne ready to use model. Have you done the procedure yourself ? Thanks

PierreMory commented 1 year ago

Hello Pierre, You can use https://github.com/bofenghuang/vigogne which is a french LLM compatible with Llama-CPP Hey Pierre, Thanks for the heads up ! How to do that exactly ? I can't find vigogne ready to use model. Have you done the procedure yourself ? Thanks

You can have more details on this page : https://github.com/bofenghuang/vigogne/tree/main/vigogne/inference#llamacpp This tutorial (in French) explains how to create the model but I downloaded it directly from this discord channel : https://discord.com/channels/1092039071435599874/1101966544906485800

I also recommand to change the model used for embeddings. I get better results with SentenceTransformers (https://python.langchain.com/en/latest/modules/models/text_embedding/examples/sentence_transformers.html). I used this multilingual model : paraphrase-multilingual-mpnet-base-v2 (https://www.sbert.net/docs/pretrained_models.html)

danielwiegand commented 1 year ago

Does somebody know a compatible German LLM?

PierreVannier commented 1 year ago

Hello Pierre, You can use https://github.com/bofenghuang/vigogne which is a french LLM compatible with Llama-CPP Hey Pierre, Thanks for the heads up ! How to do that exactly ? I can't find vigogne ready to use model. Have you done the procedure yourself ? Thanks

You can have more details on this page : https://github.com/bofenghuang/vigogne/tree/main/vigogne/inference#llamacpp This tutorial (in French) explains how to create the model but I downloaded it directly from this discord channel : https://discord.com/channels/1092039071435599874/1101966544906485800

I also recommand to change the model used for embeddings. I get better results with SentenceTransformers (https://python.langchain.com/en/latest/modules/models/text_embedding/examples/sentence_transformers.html). I used this multilingual model : paraphrase-multilingual-mpnet-base-v2 (https://www.sbert.net/docs/pretrained_models.html)

Wow, great content Pierre !! I've more than enough material to mess around another couple of week-ends of mine !! 😆 I let you know how it goes. Thanks a lot

PierreVannier commented 1 year ago

@PierreMory , I've followed Pere Conteur tuto and ingest a bunch of french PDF but is it normal that when querying it replies in english ?

PierreMory commented 1 year ago

I encountered the same problem. I managed to get French answers by customizing the prompt given to the chain. Here is my code :

from langchain.prompts import PromptTemplate

prompt_template = """Instructions: Use the following pieces of context to answer the question at the end. If you cannot answer with the given context, or if you don't know the answer, just say that you don't know, don't try to make up an answer. Always answer in French.

Context:
{context}

Question: 
{question}

Answer (in French):"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["question", "context"]
)

Then later in the code :

    qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True,
        chain_type_kwargs={"prompt": PROMPT},
    )

With that prompt you should get french answers !

PierreVannier commented 1 year ago

I encountered the same problem. I managed to get French answers by customizing the prompt given to the chain. Here is my code :

from langchain.prompts import PromptTemplate

prompt_template = """Instructions: Use the following pieces of context to answer the question at the end. If you cannot answer with the given context, or if you don't know the answer, just say that you don't know, don't try to make up an answer. Always answer in French.

Context:
{context}

Question: 
{question}

Answer (in French):"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["question", "context"]
)

Then later in the code :

    qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True,
        chain_type_kwargs={"prompt": PROMPT},
    )

With that prompt you should get french answers !

Hi Pierre, Sorry for the late reply. Yes, I figure this out as well. It's working now. Have you tried the feature to add documents even after the first ingesting phase ?

PierreMory commented 1 year ago

No. When I want to add more data I drop the DB and I run ingest.py again. It is not a real problem as the sentence embeddings I am using is pretty fast.

NukeDev commented 1 year ago

Hello, how can i use it in Italian too?

Thanks

Mer0me commented 1 year ago

If anyone can post an updated tutorial on how to use a french llm with privateGPT. The PereConteur tuto doesn't seems to work here. Can we (and where) download the .bin and only change the .env ?

Uddeshya1052 commented 1 month ago

@PierreMory @PierreVannier how can i use privategpt for german language?? I always gives answer in english.