sindresorhus / transliterate

Convert Unicode characters to Latin characters using transliteration
MIT License
286 stars 20 forks source link

Hyphens / dashes normalization? #11

Open danielweck opened 4 years ago

danielweck commented 4 years ago

There are many different variants which can all be normalized to -: https://www.compart.com/en/unicode/category/Pd http://jkorpela.fi/dashes.html

However, to keep things simple: https://en.wikipedia.org/wiki/Wikipedia:Hyphens_and_dashes

=>

["–": "-"],
["—": "-"],
["−": "-"],
["‒": "-"],
danielweck commented 4 years ago

Side note: there is already an "underscore" transliteration for a similar-looking Arabic character: https://github.com/sindresorhus/transliterate/blob/405813848bd8555efcaaab298c95ba2daf53cbdc/replacements.js#L245

sindresorhus commented 4 years ago

We can do:

string.replace(/\p{Dash_Punctuation}/gu, '-');

to cover all the dashes.


Full reference: https://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt