llaske / sugarizer

Sugarizer is a web implementation of the Sugar platform to run on any device or browser
https://sugarizer.org
Apache License 2.0
197 stars 411 forks source link

Bad localization for some countries name in Color My World activity #265

Closed llaske closed 5 years ago

llaske commented 5 years ago

Some countries are not displayed correctly in Color My World activity. For example {{Syria}}, {{Lybia}}, {{Ivory Coast}}, ...

capture d ecran 2019-01-24 a 15 37 46

It's related to a mismatch between country name in .geojson files and the country name in the locale.ini file. The idea is to fix all countries where this problem apply by matching name in each language of locale.ini file with the name used in the .geojson file. Note that the same country is present in several files (africa.geojson and world.geojson for example for Ivory Coast). Source code for Color My World activity is here: https://github.com/llaske/sugarizer/tree/dev/activities/ColorMyWorld.activity

b282022 commented 5 years ago

I'm interested in working on this and would like to explore the codebase. Can you give me initial pointers for it? Thanks!

quozl commented 5 years ago

Thanks. Please look at https://github.com/llaske/sugarizer/tree/dev/activities/ColorMyWorld.activity

llaske commented 5 years ago

@b282022, See an example below for Ivory Coast.

The left of INI property here: https://github.com/llaske/sugarizer/blob/dev/activities/ColorMyWorld.activity/locale.ini#L138 Should match the name value here: https://github.com/llaske/sugarizer/blob/dev/activities/ColorMyWorld.activity/data/world/world.geojson?short_path=75ffc99#L33 And the name value here: https://github.com/llaske/sugarizer/blob/dev/activities/ColorMyWorld.activity/data/africa/africa.geojson?short_path=ec530db#L22

In this case:

The fix should do the same thing for all countries.

llaske commented 5 years ago

@quozl (seen your initial answer) FYI, in Sugarizer, localization is in .INI file then generated into PO by a specific tool.

b282022 commented 5 years ago

@llaske So I need to find all the countries from all .geojson files in which the NAME starts with quotes and update them?

llaske commented 5 years ago

@b282022 Yes, but not only. Some values are just different spelling in these files. Case of Libya: "Libyan Arab Jamahiriya" in some file and just "Libya" in others.

b282022 commented 5 years ago

So it seems as if that this task has to be done manually and I need to save all the names somewhere as obtained by the different geojson files. After storing the names from geojson file, I have to match them with the keys(left hand side) in locale.ini file. Also this has to be done by keeping a common name among different names of same country in different geojson files. Do you have any idea about how this task can be automated? PS: This is my first time working on any open source project so please bear with me

b282022 commented 5 years ago

Also @llaske can you please tell me how many times a same country with different names can appear in different geojson files?

What I am asking is how the geojson files are made. Continent-wise and every country is included in world.geojson or is there a country X whose entries are in more than 2 geojsons

b282022 commented 5 years ago

So after a bit of scripting, I was able to do the frequency analysis of the country names and I created this JSON file for the reference.

As can be seen from the JSON file, there are countries appearing 3 times in different geojson files (I don't know how the geojson files are made so I don't know why 3), there are countries appearing 2 times in different geojson files (maybe once in a continent's geojson and once in world.geojson) and countries which are appearing only once across different geojson files (and those are of our interest) and have to be merged under a single name.

So, I request you to guide me on how to approach this problem further. Thanks!

b282022 commented 5 years ago

Clearly we can see that Libya, Syria and Ivory coast have different names across different geojson files. On a very rough glance, the other countries which have different names across different geojson files are Vietnam, East Timor, Moldova, Brunei, ...

I'm not an expert on the country names but I'm sure there are many more other countries like the ones mentioned above.

llaske commented 5 years ago

Nice work @b282022 !

My opinion is that every country should at least appears 2 times: one in the world.geojson and one in the .geojson file. I guess countries that appears 3 times are Islands in middle of some ocean. So countries that appear 1 time only are probable mistake: not the same name used in world and continent file.

Of course, you have to check too that the (same) name appears in the left hand side of the .INI file. Note that the same name should appears N times on the .INI file (N is the number of different section in the INI file).

Regarding country names, I think we could use UN countries name: http://www.un.org/en/member-states/ So we could say that names in geojson/INI should match UN countries name.

b282022 commented 5 years ago

Again after a bit of scripting, I found that these keys(countries) don't appear N times (N = 10)

b282022 commented 5 years ago

@llaske Here is the list of countries with conflicting name and proposed corrections. The highlight conventions are mentioned in the list. The resolution is done according to the names as per UN members list. Some countries cannot be resolved and hence waiting for more guidance on how to approach those countries. Also, when a particular country is clicked, I couldn't find the flow in code how the names are displayed. If that flow can be understood then I can make appropriate changes in locale.ini file as well based on the conflicts resolved so far.

llaske commented 5 years ago

Good analysis @b282022

Here is the code where the names are displayed: https://github.com/llaske/sugarizer/blob/dev/activities/ColorMyWorld.activity/lib/colormyworld.js#L541 It takes the name value in the geojson file, replace '_' by ' ' and get the value for this key (left side of the INI file) in the current language (language section in the INI file).

Regarding countries with conflict, It's about North Korea and South Korea. I think we need to use this names instead of "Korea" and "Korea republic". Alternatively I found another countries names source (https://www.nationsonline.org/oneworld/countries_of_the_world.htm) who display the common name then the prefix: "Moldova, Republic of" instead of "Republic of Moldova", "Syria, Syrian Arab Republic" instead of "Syrian Arab Republic" and "Macedonia, Republic of" instead of "The former Yugoslav Republic of Macedonia". I think it's more readable like this.

b282022 commented 5 years ago

@llaske Thanks for the help 😄I really appreciate your help to smoothen my work on the very first issue. For the conflicted countries, I used the same name across different geojson files and in the left hand of the locale.ini file and the bug was solved, I don't think I will be able to correctly translate the proper names according to the new convention in the languages other than english but I'll make a branch from the dev branch and push the code tomorrow with the new naming convention in mind so that you can review it and we can close this issue :D

Also, why there are keys put under ? Is some kind of fallback for other languages as in if the translation in that language is not found then it will display according to the rules put under *?

Also, do let me know how am I supposed to commit the changes? That is, I make changes in the geojsons and locale.ini directly or do I have to make changes in these files in some different way?

llaske commented 5 years ago

That's fine if you fix the value only in english. Just ensure that the left hand of the locale.ini is right for all languages. Translators will be able to fix other strings later in http://translate.sugarizer.org/projects/sugarizer/activity-colormyworld/

The * section in the .INI file is for languages used by the UI but not found in the .INI. Basically, it's fine to put here the same values than english values.

Once you'll end your fix, create a new branch inherited from dev branch with all files updated (ini, geojson, ...) then send a PR.

llaske commented 5 years ago

Fixed in https://github.com/llaske/sugarizer/pull/271

llaske commented 5 years ago

Closed in https://github.com/llaske/sugarizer/commit/dd3c58d56981e4dfa6d94601b95d7117a3bc9241