su77ungr / CASALIOY

♾️ toolkit for air-gapped LLMs on consumer-grade hardware
Apache License 2.0
230 stars 31 forks source link

Multilanguage support #103

Open janvarev opened 1 year ago

janvarev commented 1 year ago

Feature request

Increase effitiency of system by translating input and output data to user language.

What we need?

So, engine will proceed search on English (correctly), but input/output will be translated

Motivation

I need to efficiently index docs on my native language.

Your contribution

I recommend to use this compact function from my project: https://github.com/janvarev/kobold_api_multilang_proxy/blob/main/server.py#L25

def translator_main(string,from_lang:str,to_lang:str) -> str:

It allow to translate strings using GoogleTranslator from deep_translator lib (standart, no special processing required) or using my project OneRingTranslator (REST translation server setup required). OneRingTranslator will allow user to choose engine for translation and even translate locally with Meta NLLB neuronet.

Default settings for user can be easy:

If we call translator_main("string","en","en") string will just return unchanged. So, it will no changes for user who don't want to use translations in their project (by default).

If user want to change, he can change UserLang and TranslationEngine option.

su77ungr commented 1 year ago

Could be an addition.

So your are basically setting up a translation service in front of your sdout? Do we know the depth of supported languages on the base model?

Split this into two parts:

Might be a thing if you already want the system the check out with the right language

How would that compare with the localisation skills of a LLM when queried in comparison to a raw translation.

janvarev commented 1 year ago

Hi! Sorry, due to some reason I can't install CASALIOY with poetry :((, but can install privateGPT with pip. (I hope you will support pip in future...)

I've prepared PR for privateGPT, you can see it here: https://github.com/imartinez/privateGPT/pull/325 or here: https://github.com/janvarev/privateGPT

There are translation logic on "before" generate, and "after" generate; both are optional. First is for translate queries to En; last is for translate result back (not necessary, but handy for user).


Supported translation logic:


A word about working with LLM without English translation - results simply significantly worse. Yes, we can query on native lang, but at 50% or more cases answers are bad (not related to topic etc.)

janvarev commented 1 year ago

UPD: you can get any code you need from PR above, I'll be glad to see it implemented.

hippalectryon-0 commented 1 year ago

Sorry, due to some reason I can't install CASALIOY with poetry :((, but can install privateGPT with pip. (I hope you will support pip in future...)

Can you open a separate issue detailing what doesn't work ?

su77ungr commented 1 year ago

I'm going to perform a comparison of the translation and localisation results by using both OneRingTranslator and guidance.

Google won't make it into this repo tho. Just a brief status update

janvarev commented 1 year ago

@su77ungr I've added BLEU measurements and script to do them in OneRingTranslator