sindresorhus / transliterate

Convert Unicode characters to Latin characters using transliteration
MIT License
286 stars 20 forks source link

Case mismatch compared with String.normalize("NFD") when decomposing diacritics into canonical unicode points #8

Closed danielweck closed 4 years ago

danielweck commented 4 years ago

FWIW, I automated some comparative checks on this repository's mappings: https://github.com/sindresorhus/transliterate/blob/master/replacements.js

...and I discovered the following case inconsistencies compared with String.normalize() and "NFD" (Canonical Decomposition):

To reproduce in the shell or Javascript console: node -e "console.log('Ş'.normalize('NFD').replace(/[\u0300-\u036f]/g, ''));"

Reference: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

sindresorhus commented 4 years ago

FWIW, I automated some comparative checks on this repository's mappings: /replacements.js@master

Would be great to have an automated test for this.