lovell / hepburn

Node.js module for converting Japanese Hiragana and Katakana script to, and from, Romaji using Hepburn romanisation
Apache License 2.0
126 stars 23 forks source link

Discussion: long vowels to and from hiragana transcriptions #17

Open risseraka opened 4 years ago

risseraka commented 4 years ago

Hi there,

Me again cheers.

As per my understanding after a few readings (I admit, I started from Wikipedia and then did some more research), use of chōonpu in hiragana is extremely rare (e.g. in some manga to add emphasis). Hence, it not being written down in src/hepburn.js in hiragana trigrams seems normal.

Moreover, Ō seems like it can be transcribed either as OU or OO in hiragana. E.g. TŌKYŌ is actually TOUKYOU, whereas ŌZEKI is OOZEKI.

toHiragana("CHŌ")
// could be either
// ちょう
// or
// ちょお

That being said, the lib could probably take an extra step and transcribe some valid romaji when there is no ambiguity (i.e. A, E, I and U) E.g.:

toHiragana("CHĀ")
// ちゃあ

On the other hand, repeated vowels in hiragana cannot safely be transcribed in romaji using the macron (eh) diacritic « ¯ ». E.g.

fromKana("からあげ")
// KARAAGE
// 唐(から)揚(あ)げ
// KARĀGE would be wrong here

So, transcribing long vowels from romaji to hiragana and back could yield a different romaji, breaking symetrical transcription.

WDYT?

Cheers.

lovell commented 4 years ago

I'm unsure if a true symmetric relationship is possible with transliteration, especially when attempting to deal with both spoken as well as written language.

Do the OO vs OU variants relate to wāpuro? https://en.wikipedia.org/wiki/W%C4%81puro_r%C5%8Dmaji

Definitely happy for improvements for situations where there is no ambiguity, as in the examples you point out. I'll add you as a contributor to this repo so you can help out.