sindresorhus / transliterate

Convert Unicode characters to Latin characters using transliteration
MIT License
286 stars 20 forks source link

Missing French "œuf" => "oeuf" (O+E ligature) #7

Open danielweck opened 4 years ago

danielweck commented 4 years ago

References:

https://en.wikipedia.org/wiki/%C5%92

https://en.wiktionary.org/wiki/%C5%93uf

Mappings:

https://github.com/sindresorhus/transliterate/blob/master/replacements.js

danielweck commented 4 years ago

Interestingly, also missing from: https://github.com/diacritics/database/blob/dist/v1/diacritics.json

sindresorhus commented 4 years ago

Indeed. A friend of mine confirmed this:

It's French for egg But indeed usually written as oeuf That's how I learned to write it in school Not with the o + e ligature

ehmicky commented 4 years ago

In the opposite direction (oe -> œ), I don't think one can replace every oe by œ in French. oe in French is pronounced as two separate vowels, whereas œ is pronounced as a single sound. Words like "œuf" should be written with œ, but words like "moelleux" should not. See https://en.wikipedia.org/wiki/%C5%92#French

Same goes with English which uses œ for latin words, but some English words have the letters "oe" as well like "toe".

However, from the œ -> oe direction, in both cases, I think it would be ok to replace œ by oe for the sake of this library purpose.

danielweck commented 4 years ago

True.

However there are use-cases that require some kind of normalization to a canonical representation, or in the case of "search" there needs to be equivalency rules in the input text ("needle") in order to match varying occurrences in the target text ("haystack"). Handling lower vs. upper-case is an obvious example, but there are also Unicode ligatures and surrogate pairs, diacritics / accented characters, whitespace normalization, etc.

More specifically: the scope of the sindresorhus/transliterate project is to eliminate Unicode characters and replace them with some equivalent ASCII-based form, so I think it is perfectly acceptable to replace all occurrences of œ with oe.

ehmicky commented 4 years ago

I agree:

I think it would be ok to replace œ by oe for the sake of this library purpose.

danielweck commented 4 years ago

Yes sorry, I was adding thoughts about use-cases, I didn't mean to imply that you were against the "oe" replacement :) (my bad for creating confusion)