scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
30 stars 69 forks source link

Create French to all other languages translation process #73

Closed andrewtavis closed 8 months ago

andrewtavis commented 8 months ago

Terms

Description

The goal of this issue is to create a process whereby a single file is used to translate all words within French/translations/words_to_translate.json to all other Scribe languages. To achieve this we'll be using m2m100_418M, with the output being a JSON file that has a string and keyed values for each language. This can then be transferred to an SQLite database table with each string in an index corresponding to a column value for each language.

Of specific importance is trying to get a metric of the accuracy of the translation and doing a cutoff such that we're no longer including low quality translations in Scribe applications :)

Contribution

Happy to work on this or support someone with interest in working on it!

Jk40git commented 8 months ago

@andrewtavis can assign me here.

andrewtavis commented 8 months ago

Hey @Jk40git 👋 Before assigning, could you let me know if you've had some experience with some of the technologies beforehand? We're talking about doing a bit of machine translation here. We can work up to that, but if you've never done anything like it before, then maybe we'd need to switch over to a different issue for now to build your skills a bit. We can save this until one of the other translation issues is done - maybe by me - and then you can follow that one as an example.

Jk40git commented 8 months ago

Okay sounds great. I don't have any experience in machine translation though. okay I will have to switch to a different issue. 👍 Or if possible can you assign me one that will help build my skills?

andrewtavis commented 8 months ago

You can maybe work on this once another one like it is done and you can follow it as an example, @Jk40git :)

I'd suggest:

Jk40git commented 8 months ago

You can maybe work on this once another one like it is done and you can follow it as an example, @Jk40git :)

I'd suggest:

Okay will go for the #67 first

andrewtavis commented 8 months ago

Feel free to write in there so we can assign :)

andrewtavis commented 8 months ago

Hey @Jk40git 👋 The process has been set up and we're ready to implement here :) It's actually quite streamlined now. If you make a version of scribe_data/extract_transform/languages/English/translations/translate_words.py that replaces SRC_LANG with French we should be good to go here 😊

andrewtavis commented 8 months ago

Thanks for this, @Jk40git! Closed via #108 with minor edits in 3140c02 :)