zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
https://privategpt.dev
Apache License 2.0
53.57k stars 7.2k forks source link

How to make PrivateGPT to translate everything into English language before store and process inputs #1549

Open PayteR opened 7 months ago

PayteR commented 7 months ago

Hi, I want to use PrivateGPT for Slovak documents, but it's not possible, because there is no LLM model that can work with Slovak language. I tried to make a small testing python script that will

And the result was actually very good. So my suggestion is that PrivateGPT could have possible configuring translators so LLM and Vector database will have to work only with English text. What do you think about this? In settings.yaml it will be something like this

translations:
  from_prompt:
    model: Helsinki-NLP/opus-mt-sk-en
    # source: eng_Latn # Could be commented when this arguments are not required to the translation model
    # target: slk_Latn
  from_response:
    model: Helsinki-NLP/opus-mt-en-sk # could be commented when it's same as from_prompt
    # source: slk_Latn
    # target: eng_Latn
  to_store:
    model: Helsinki-NLP/opus-mt-sk-en
    # source: eng_Latn # English
    # target: slk_Latn # Slovak
  from_store:
    model: Helsinki-NLP/opus-mt-en-sk # could be commented when it's same as to_store
    # source: slk_Latn
    # target: eng_Latn

What do you think about this? Im a noob in Python so I'm not able to make it by myself, thx.

gsgalezowski commented 5 months ago

Hi, I want to use PrivateGPT for Slovak documents, but it's not possible, because there is no LLM model that can work with Slovak language. I tried to make a small testing python script that will

* read the txt file

* translate it with `Helsinki-NLP/opus-mt-sk-en` into English

* summarize that English text with `Falconsai/medical_summarization`

* translate that summarization into the Slovak language with `Helsinki-NLP/opus-mt-en-sk`

And the result was actually very good. So my suggestion is that PrivateGPT could have possible configuring translators so LLM and Vector database will have to work only with English text. What do you think about this? In settings.yaml it will be something like this

translations:
  from_prompt:
    model: Helsinki-NLP/opus-mt-sk-en
    # source: eng_Latn # Could be commented when this arguments are not required to the translation model
    # target: slk_Latn
  from_response:
    model: Helsinki-NLP/opus-mt-en-sk # could be commented when it's same as from_prompt
    # source: slk_Latn
    # target: eng_Latn
  to_store:
    model: Helsinki-NLP/opus-mt-sk-en
    # source: eng_Latn # English
    # target: slk_Latn # Slovak
  from_store:
    model: Helsinki-NLP/opus-mt-en-sk # could be commented when it's same as to_store
    # source: slk_Latn
    # target: eng_Latn

What do you think about this? Im a noob in Python so I'm not able to make it by myself, thx.

In order for this recipe to work, is it necessary to first convert Helsinki-NLP/opus-mt-sk-en to gguf and put it in the models folder privategpt?