interscript / geonames-transliteration-data

GeoNames data parsed into transliteration pairs
2 stars 0 forks source link

Some translations are not correct in mon_Cyrl2Latn_BGN_1964.csv #7

Open javkhaanj7 opened 3 years ago

javkhaanj7 commented 3 years ago

Here are couple of issues we need to fix:

  1. Mongolian uses only Cyrillic letters. There are 2 examples names written in kanji (Chinese) which are 吉兰泰镇 and 吉兰泰. Solution:

mon_Cyrl2Latn_BGN_1964,mon,17600805,NS,"吉兰泰镇","吉兰泰镇",17600738,N,"Jarantai Zhen","Jarantai Zhen" replace by: mon_Cyrl2Latn_BGN_1964,mon,17600805,NS,"Жарантай Сум","Жарантай Сум",17600738,N,"Jarantai Sum","Jarantai Sum"

mon_Cyrl2Latn_BGN_1964,mon,17600803,NS,"吉兰泰","吉兰泰",17600737,N,Jarantai,Jarantai replace by: mon_Cyrl2Latn_BGN_1964,mon,17600803,NS,"Жарантай","Жарантай",17600737,N,"Jarantai","Jarantai"

  1. Wrong translation for х character. Solution: mon_Cyrl2Latn_BGN_1964,mon,18973607,VS,"Сүхбаатар","Сүхбаатар",-3255713,V,"Sükhbaatar","Sükhbaatar" replace by: mon_Cyrl2Latn_BGN_1964,mon,18973607,VS,"Сүхбаатар","Сүхбаатар",-3255713,V,"Sühbaatar","Sühbaatar"

  2. Full names are in wrong order. Solution: mon_Cyrl2Latn_BGN_1964,mon,11005031,V,Orhon,Orhon,11005352,VS,"Орхон","Орхон" replace by: mon_Cyrl2Latn_BGN_1964,mon,11005031,V,"Орхон","Орхон",11005352,VS,Orhon,Orhon

ronaldtse commented 3 years ago

@javkhaanj7 this is good. Could you actually help go through the Mongolian database to ensure the transliteration data is correct? You have to go to to download all names of Mongolia to check. Thanks!

ronaldtse commented 3 years ago

Issue 1 is described in #3 . We need to detect where the GeoNames database is incorrect in its mention of the transliteration system. e.g. it marks mon_Cyrl2Latn_BGN_1964 system for 吉兰泰镇 => Jarantai Zhen, that is actually the bgnpcgn-zho-Hans-Latn-1964 system in Interscript (so we need to make a mapping between OGC codes, see

Issue 3 is described in #4. Full names are in wrong order perhaps due to a coding problem when we generate the smaller datasets, since the "pairs" CSV files were processed from the original database.