namsor / namsor-tools-v2

NamSor command line tools, to append gender, origin, diaspora or us 'race'/ethnicity to a CSV file.
GNU Lesser General Public License v3.0
4 stars 4 forks source link

ORIGIN : MK names in Cyrillic classified as BG #12

Closed namsor closed 3 years ago

namsor commented 3 years ago

With NamSor Origin API v2.0.11, MK (North Macedonia) names are classified as BG (Bulgarian).

namsor commented 3 years ago

Not clear how well onomastics can differentiate b/w MK and BG names, cf. current affairs https://www.lemonde.fr/international/article/2020/11/18/querelle-linguistique-heros-dispute-pourquoi-la-bulgarie-entrave-la-marche-de-la-macedoine-du-nord-vers-l-europe_6060213_3210.html

namsor commented 3 years ago

2002 Census data here, http://www.stat.gov.mk/pdf/kniga_13.pdf

namsor commented 3 years ago

Fix will be in v2.0.12,

MK   BG  
Recall Precision Recall Precision
0.994253 0.999674 0.986087 0.799879
       
F-Score   F-Score  
0.996956   0.883276