Closed aronasorman closed 9 years ago
Hi @aronasorman,
How to replicate this issue? Or how do the fetching translations works?
@jesumer the languagepackdownload
management command is the file you're looking for!
Hi, am attaching the email of @jamalex relating to this issue along with the scripts attached for reference. As @aronasorman said - after due modifications, we may be able to open-source these!
FYI: I can't attach *.py
or *.zip
files here so I added it on Google Drive and here's the link to the folder with the scripts: https://drive.google.com/open?id=0Bwf_-YL9LvlVN2lVX1hEeWRRRlE&authuser=0
Start of Email History:
From: Aron Asor
Date: Mon, Dec 20, 2014 at 2:10 AM
This seems pretty tricky. I'll go ahead and read their stuff, and I'll give
an assessment in a bit on how hard it is to add this to our pipeline.
Best case, we open-source this before they do!
---------- Forwarded message ----------
From: James Irwin
Date: Mon, Dec 15, 2014 at 7:45 PM
Subject: Re: Internationalization tools for khan-exercises and Perseus
To: Jamie Alexandre
Cc: Ben Eater, Aron Asor, Richard Tibbles
Hi Jamie,
Sorry for delayed response. Unfortunately the scripts to build the
translated version of the html files are pretty intertwined with the
rest of our compiler and it is not easy to remove. I have attached
exercises/babel.py and kake/translate_exercises.py which uses
kake/translate_javascript. We would like to refactor and open source
this at sometime, but its a substantial amount of work so we can't do
it soon. With some reworking you should be able to use the
translate() function in translate_exercises to create the files for
the other languages.
In terms of translating perseus questions you have the right idea. It
also has some trickiness though. I've added our
assessment_items/models.py file which has a
traverse_natural_language_parts() method and an
assessment_items/i18n.py file which has a function
translate_serialized_assessment_item that will translate the fully
parsed item.data. It also translates decimals automatically in case
translators have not done so.
Hope this helps and you can get it working for all languages. Let me
know if you have any further questions here.
Best,
James
End of Email History
To expound on this, there needs to be two translation systems, one for khan-exercises and another for Perseus. The difference stems from how the questions are stored.
For khan-exercises, the questions are stored in the static html files. My guess is that the way we wanna translate here is to replace the text statically and write that to the zip, or to a folder and then zipped up.
For Perseus, exercises questions are stored inside ka-lite/data/khan/assessmentitems.json
. It's a JSON map, with each item's item_data
containing the data for the question. You'll want to create a function that takes in the assessmentitems.json
in English and then outputs a new json file that has been translated.
I'm not exactly sure what James' py files contain, but it should provide some insight on how their i18n pipeline works. We might wanna start with tackling the Perseus exercises, since that might be easier, and then open another issue for khan-exercises.
HI @aronasorman,
I have verified the languagepackdownload command and used "de" as a language. I have noticed that the languagepackdowload processes has something wrong. As we can see, I had already downloaded the de language pack from https://learningequality.org into my local.(see the screenshot)
And then it will supposedly work the processes. In my basis it will supposedly added the "de" folder at ka-lite/kalite/i18n/static/khan-exercises.(see the screenshot)
Did you set German as your default language?
It should've been added into the ka-lite/locale
directory. Since there are no translated exercises for German, there will be no de folder in khan-exercises.
Hi @aronasorman,
Ah I see. Yeah I set German which is "de".
Hi @aronasorman,
My Updates: Cyril helped me out on how the "languagepackdownload" command works on the local central server and "update_language" command on the local distributed. On distributed side, we had sometime figuring out how the "select language pack" dropdown works and why the dropdown doesn't have the languages or it's empty. Then after, we figured out it will used the static folder. The static folder should have the data folder which contains the "language_pack_availability.json" file which is used to populate the dropdown menu. The populate_installable_lang_pack_dd() js fuction is used for the json population. I am not yet finish on verifying and tracing. I am not yet finish on the "python manage.py update_language_packs --no-dubbed --no-ka-trans --no-srts --no-exercises" and currently downloading. Will continue the issue after. Thank you
Findings with Cyril.
We tackled the contentload command at distributed and we tackled the update_language_packs at the central server. We found out that the json's(exercises.json, topics.json, etc) at kalite/data/khan at distributed was from khan academy api which doesn't do at the central server.
I do the search about how the assessment.json works and end up kalite/distributed/static/perseus/get_all_items.py and playing around the urls used like changing the language code at the end.
We propose to create the app which is
This will be discuss with Cyril for further details about the proposal.
Hi @aronasorman - as per our last talk with @jesumer, here are the things we need to do for this issue:
update_language_pack
to download Khan Academy strings from Crowdin to build the "deutsch language pack".languagepackdownload -l de
to download the language packHi All,
As what my progress yesterday I was able to replicate and determine what are the strings that needs to have internationalized at the perseus exercise as seen in the screenshot.
And here is where I planned to get the strings from Asssesment Item:
Now the problem is the assessment item has special characters on it which I think it was important. Here is the sample item json data:
Cyril suggest to put "gettext()" into the template in which I figure out where it was happen or where should I find that template.
@jesumer Let's take a look at the scripts that @jamalex forwarded as @aronasorman suggested.
Let's ditch the gettext()
javascript suggestion I made earlier and see if:
Hi All,
I was able to use the i18n.py script from James at Khan Academy. I noticed of using this import modules:
Some of the function are important like i18n._(text), i18n.request_language_decimal_format(), i18n.format_decimal(). I need to know where I can get this. Thank you.
Updates:
I pulled the develop branch update and successfully run the perseus exercises.Iassessment.json, I tried wrapped it like _(ASSESSMENTITEMS) but that doesn't work and get errors because of some special characters inside like '\r\n', \n', '\r', and '\n' so Now, I started coding the assessment json to have it translatable or wrapping the strings into () function.
Jesumer
Updates and findings:
I have tried to find out the sample German language strings that has been approved from crowdin. and this is for french language
I set the german language and this is the output into our site: And for the french:
I also tried to find out in the python shell. Here is the german language sample:
And the french:
Then after the "languagepackdownload" in distributed, I check out and find the strings(e.g "Create a picture graph to show how many teeth each student has lost.") in the mo file located at ka-lite/locale/de/LC_MESSAGES/ and also at ka-lite/locale/fr/LCMESSAGES/ but unfortunately it isn't there and the string doesn't exist. Is there anything that I missed?. Base of my understanding, the mo files from central server has all the translated strings accordingly to our language we set. I also have the python scripts to enclose the non-translated string from our assessment items into gettext or ().
Best, Jesumer
Hi @jesumer - you must copy the whole untranslated string from Crowdin (not portion only) and use that on your Python shell. Example based on your link above:
https://crowdin.com/translate/khanacademy/27617/enus-de
**Create a picture graph to show how many seeds Johnny Appleseed planted in each location.**\n\nLocation | Apple seeds \n- | :-: | -\nCharleston | $2$ \nLouisville | $6$ \nRichmond | $5$ \nSpringfield | $6$ \n\n![](https://ka-perseus-images.s3.amazonaws.com/a5b68232872a3e078a942f9c298e815b2a92f4e9.png)\n\n[[☃ plotter 1]]\n
@jesumer I have just filed a related issue at https://github.com/fle-internal/ka-lite-central/issues/225 on central repo.
Please verify if that indeed affects/fixes this issue.
Hi,
We have already downloaded the translated po files at the central server we can replicate this in our distributed console and have it translated. Here is the screenshot
This is the po files from crowdin and search the translated string("Telling time without labels")"
Now the problem is we can't find the string("Telling time without labels") into the distributed browser. I think the assessment items json is the problem.
Here is the topic tree
And here is the result at the browser
Jesumer
Hi,
I'm working with the multiple hints.
Jesumer
Hi,
I'm done with the multiple hints.
Jesumer
Hi all,
I have already refactored my script at get_assessment_item_data. We can now use it. Thanks
Jesumer
Hi,
After running the perseus exercise that is default to de language code, here is my example working scripts at the browser.
Notice the Answer pane at the right. It has a translated string to German language.
(Note that the perseus exercises are randomly change every after the page was refreshed. So just find any example of the perseus exercise that has already have the translated strings and test it to the browser.)
Jesumer
DE is still working on the review and approval for geo, I recommend you use early-math exercises for testing as there you'll have already 95% of the strings translated and approved.
Hi @jesumer I will review this issue you made and the PR
you made so this issue will be close.
HOLD UP! No one close this issue yet.
@aronasorman what remains before closing this issue?
I just merged it to make it easier to test the dummy language packs. Once the tests are fixed for that PR we can merge that, and everyone can test i18n.
With both dummy language packs and perseus exercises merged in, we can now properly test this issue.
I created a dummy language pack by running:
bin/kalite manage create_dummy_language_pack
I then switched to Esperanto (the name for the dummy language), and got this:
So the interface is partially translated. However, when I open the exercises, I don't see any translated strings:
It seems like the strings aren't in the fetched po file in the first place.
Ah, seems like there's no en
language in Khan Academy, so it might not be a good base language pack after all.
String is in django.po
, but it's not getting substituted in the exercise:
Works in the terminal though:
Turns out I just needed to refetch the assessment items. I'm now getting this:
Notice the accented instructions on the right side, but unaccented strings on the question area.
Hints are translated:
Ok, I think I found the issue. Django's gettext
can't seem to find the translated string, even though they're in the po file and they look exactly the same.
See this entry in the po file, with accents and all:
But when I go to pdb, ugettext
can't find it:
Looks like it could be an issue of how the pieces are being chunked up? Are we using that code that KA sent us?
@jamalex Yeah, all the exercise-related strings are from Khan Academy. They might be using a different system for localization?
Right, the strings are from them. But they shared code with us that parsed item_data
into the translatable chunks or something, right? The po file strings don't just contain the entire contents of each item_data
field, do they?
Looks like we just do: answerarea_content = _(answerarea_content)
Is that what KA does?
How would that work when we have changed the URLs, btw?
How would that work when we have changed the URLs, btw?
I think the issue here is how python's gettext
finds strings, as polib can find the exact same strings (and thus fetch the translations) while _
can't:
@jamalex we didn't get too much use out of KA's files, as we needed (i think) 5 more modules. I believe it's too tied to their code.
Took a bit of digging, but I finally found a dict that maps a string to its translated counterpart, as read by python's gettext
:
from django.utils import translation
translation.activate('eo') # "eo" is the code for the DEBUG language.
catalog = translation.trans_real.catalog()._catalog
catalog
is what you're looking for.
So, it turns out to be an encoding issue.
Django reads in PO files as ascii
:
'**Which choices represent the number $550$ ?**\n\n[[\xe2\x98\x83 radio 1]]'
However, we read in assessment items as unicode:
u'**Which choices represent the number $550$ ?**\n\n[[\u2603 radio 1]]'
The solution is to encode the untranslated strings into ascii before passing them to the gettext
function:
question_content = _(item_data['question']['content'].encode('utf-8'))
And thus we get:
So now the problem is the accenting
module translating KA's pseudo-markup, breaking the exercise code.
Wooooot!
So I got this specific exercise translated:
However, it looks like I'm gonna have to go through each type of exercise, find their structure, and translate them.
Just created a function that will automatically translate all types of questions \o/.
While waiting on #3342, I'll work on getting the other text translated. These are most likely related to backbone.js + djangojs
issues.
Right now we only support officially internationalized KA sites like fr, pt-br and es. We wanna expand that to any user-translated exercises like de. We should also fetch translations for the Perseus exercises.