lokinmodar / Echoglossian

FFXIV Dialogue text translator
Creative Commons Attribution 4.0 International
76 stars 17 forks source link

[Enhancement] Terminologies / Glossaries #18

Open Bluefissure opened 2 years ago

Bluefissure commented 2 years ago

There are terminologies in the game including place names, person names, country names, etc.

For example, in Chinese:

Alisaie -> 艾丽莎(Google Translate)  -- 阿莉塞(CN client)
Forum -> 论坛(Google Translate)  -- 哲学家议会(Forum of Sharlayan, CN client)

The translation API cannot detect them nor correctly translate them, so I wonder whether we can provide a dictionary of such terminologies before or after the translation API?

Bluefissure commented 2 years ago

I think a regex match & replacement can be a good start.

Other than that, https://www.deepl.com/sv/blog/announcing-glossary-support-for-deepl-api seems to be an attempt but only works for limited languages.

Bluefissure commented 2 years ago

image replace the text before calling the translation API seems to work fine.

lokinmodar commented 2 years ago

Nice suggestion about the glossary. I think we can compile some stuff if not yet available somewhere...

Bluefissure commented 2 years ago

Yeah, I think it's easier to implement an interface that reads the glossary CSV file and then replaces the terms. The glossary can be user-generated and pre-defined by a default one, which could be from a collaboration contribution from players of different languages.

lokinmodar commented 2 years ago

My idea is to gather all generated translations per language and upload them to a private spreadsheet so ppl can revise and contribute and then the plugin updates the local db with the correct stuff

Bluefissure commented 2 years ago

I think it'll be complicated to gather all of the translations, let alone contribute to editing every wrong entry. And for the glossary, there are already several data-mining repos that can be used to generate the terms (from the exported csv files). There's a https://strings.wakingsands.com/ which compares Chinese/English/Japanese text from their clients, which can be a help.

Nikslg commented 2 years ago

I would like to add, that ocasionally in Russian translation, it decided to translate Names.. in a quite perticular way. For some reasons "Haurchefant" was translated into "Elephant-shark" in Russian language. image

lokinmodar commented 2 years ago

I would like to add, that ocasionally in Russian translation, it decided to translate Names.. in a quite perticular way. For some reasons "Haurchefant" was translated into "Elephant-shark" in Russian language. image

Lol. It is something i noticed in Portuguese also. I was thinking of setting the language detection routine to always base itself in the client language instead of trying to guess the name language. For some names this could cause issues like the one you had