ufal / charles-translator-web-frontend

Charles Translator: MT from Charles University
6 stars 7 forks source link

Language detection and warning that the input is in Russian #25

Open jlibovicky opened 2 years ago

jlibovicky commented 2 years ago

The client should do automatic language detection, e.g., using the languagedetect library and

  1. Warn the user if the input is in Russia that the translation quality will be low.
  2. Automatically swap the translation if needed.
martinpopel commented 2 years ago

I am not sure if we should automatically swap the translation direction or just suggest it (showing a button "Translate instead from ..."). BTW: Google Translate does automatic swap, it shows a notification that the source language was changed, but it does not provide a way to override this detection, see e.g. https://translate.google.cz/?sl=cs&tl=en&text=city&op=translate - I cannot find a way to translate "city" as "feelings/emotions" without changing the input (e.g. to "moje city").

Once the automatic swap of translation direction is implemented, we could revise the decision in #15 - and reimplement the switch-direction button so that it switches also the contents (as done in Google Translate).

stranak commented 2 years ago

Just another option for how to do the detection: https://www.npmjs.com/package/cld

EbrithilNogare commented 2 years ago

Language detection is big library, that should not run on customer device, but on server at backend. Frontend with that capability would load more than 10s on fast connection.

So beter would be check it on server side when translating.

Than we can show to user message "you want to swap the language".

EbrithilNogare commented 2 years ago

Wouldnt be simplier just checking if alphabet is russian? or do ukraine has same keyboard (chars). If we want it on front-end (and we want it there) we must do much simplier check for russian.

stranak commented 2 years ago

@maartenpt is using CDL3 in TEITOK, so I asked him to make it available in the API: https://lindat.mff.cuni.cz/services/teitok/tools/api.php?action=langdetect&text=jejich%20význam%20je%20pouze%20anotačn%C3%AD%20a%20informativn%C3%AD

Maybe we could use it for all LINDAT services that need to detect input language?

EbrithilNogare commented 2 years ago

I implemented checker by alphabet counter in commit cb70d4b2dd6bb92ef1e534faf859d397d2d3756e now available on development testing server

EbrithilNogare commented 2 years ago

needs text and translation for this placeholder

        "maybeRusian": "Ruština není podporována",
        "maybeUkrinian": "Nesouhlasí abeceda a jazyk",
        "maybeCzech": "Nesouhlasí abeceda a jazyk"