Closed andrewtavis closed 7 months ago
Can I work on this issue?
You certainly can, @mhmohona. Let me just merge one of the other ones so that we can use it as a reference for the others so we have some consistency :) Could you let me know what your Python experience is as well?
No need to fill me in on the Python experience, @mhmohona 😊 Looks great based on your profile! Again let me merge one in, but your feedback on the process would be very welcome!
Hey @mhmohona 👋 The process has been set up and we're ready to implement here :) It's actually quite streamlined now. If you make a version of scribe_data/extract_transform/languages/English/translations/translate_words.py that replaces SRC_LANG
with German we should be good to go here 😊
Give it a test to see if it's working on your end by running the script in the header and letting it run for one batch so we can see what comes out!
Thanks for letting know @andrewtavis!
@andrewtavis, so I tried to run the scribe_data/extract_transform/languages/English/translations/translate_words.py file and got following error in my device -
From google colab -
Have you changed the SRC_LANG
to "German" in the file?
ok, I found the problem. After adjusting parameters it got fixed. Shall I submit the script with my adjustment or the original one(english script). Also including Parallel Processing
can improve the runtime. Can I update the script accordingly @andrewtavis?
Sounds great, @mhmohona! Let's do a PR for just the German translation script and I'll update the other ones after :) Thank you!
Terms
Description
The goal of this issue is to create a process whereby a single file is used to translate all words within German/translations/words_to_translate.json to all other Scribe languages. To achieve this we'll be using m2m100_418M, with the output being a JSON file that has a string and keyed values for each language. This can then be transferred to an SQLite database table with each string in an index corresponding to a column value for each language.
Of specific importance is trying to get a metric of the accuracy of the translation and doing a cutoff such that we're no longer including low quality translations in Scribe applications :)
Contribution
Happy to work on this or support someone with interest in working on it!