scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

Create Italian to all other languages translation process #75

Closed andrewtavis closed 6 months ago

andrewtavis commented 7 months ago

Terms

Description

The goal of this issue is to create a process whereby a single file is used to translate all words within Italian/translations/words_to_translate.json to all other Scribe languages. To achieve this we'll be using m2m100_418M, with the output being a JSON file that has a string and keyed values for each language. This can then be transferred to an SQLite database table with each string in an index corresponding to a column value for each language.

Of specific importance is trying to get a metric of the accuracy of the translation and doing a cutoff such that we're no longer including low quality translations in Scribe applications :)

Contribution

Happy to work on this or support someone with interest in working on it!

ikeadeoyin commented 6 months ago

Hello @andrewtavis, I am interested in working on this.

andrewtavis commented 6 months ago

Sounds good, @ikeadeoyin! Let us merge in another process that's for English and then you can use that as a basis. Should be merged by Wednesday 😊

ikeadeoyin commented 6 months ago

Alright, that is okay.

andrewtavis commented 6 months ago

Hey @ikeadeoyin 👋 The process has been set up and we're ready to implement here :) It's actually quite streamlined now. If you make a version of scribe_data/extract_transform/languages/English/translations/translate_words.py that replaces SRC_LANG with Italian we should be good to go here 😊

andrewtavis commented 6 months ago

Hey @ikeadeoyin 👋 I went ahead and sent along the change in 2b72e64 as I had a few other things that I needed to get done, and this needed to get finished up :) Hope all's well!