Closed Akeru closed 11 years ago
This is an interesting request which is worth a discussion :)
From my point of view rules to convert long vowels (as well particles) are described by the romanization system and should then remain in the romanization classes (ie: Hepburn, Kunrei, etc...)
So using Hepburn 東京 should always be Tōkyō and using Kunrei 東京 should always be Tôkyô
I agree this deviate a bit from the "standards" but these are only "so-called".
In real life you see a funny mix of all of them, which is expected since the Japanese Foreign Ministry itself allows some (all ?) of the given variant for official document (on passports you can see さとう written Satoh. Note: in this passport-mode, only long O is handled, others are simply ingnored).
I think it should be possible to handle this is some way. ie : the default long vowel style should match the standard but could possibly be overridden.
Another neat feature would be the possibility to turn on/off the use on "m" before "p" and "b" suing Hepburn. So しんぶん could be romanized into shinbun or shimbun. This, again, because quite often both romanization exists. (Yes, japanese romanization is a mess :smile: )
That's why PHP stands out for this library...
So, what do you think of this ?
Could be an interesting feature, but I am debating on how to implemented this (and lack of time :D). I think the best way would be to updated the Romanization class and pass all those settings (how to handle long vowels, particules, "m" before "p" and "b" , etc.) as member variables.
Each romanization system class will then define their own default settings, which could be overridden, if needed, before calling the transliterate method.
I set this to the milestone 0.5 as well. On Jun 26, 2013 1:24 PM, "Axel Bodart" notifications@github.com wrote:
So, what do you think of this ?
— Reply to this email directly or view it on GitHubhttps://github.com/mbilbille/jpnforphp/issues/27#issuecomment-20040594 .
Having said that, I am thinking of refactoring (again :D) the Transliterator component.
- TransliterationSystemInterface* (interface)
- Romaji* (abstract class)
- Hepburn
- Kunrei
- Nihon
- Wapuro
- JSL
- Kana* (abstract class)
- Hiragana
- Katakana
TransliterationSystemInterface being the old RomanizationInterface Romaji being the old Romanization abstract class *Kana class will be split into 2 sub-classes following the same architecture than the Romaji abstract class.
Let's see ! :laughing:
Still have to work of this settings part to specify how to handle long vowels, particules, "m" before "p" and "b" , etc.
I'd have some comment on this :smile: Would you prefer me to wait a bit (as you might have some more commits pending) or can I start ?
Please see https://github.com/Akeru/jpnforphp/commit/0e9e05cf3398b1c4dd3fa7271430aadd13a603b8 for a Kana refactoring proposal. This will be easier this way (instead of spamming the issue's comments)
Sorry for the delay...
Off topic: I kinda was off for the past 3 months, but I'm an happy freshly married guy and I'm back now :)
I gave some thoughts about it. What we are actually saying is that those classes share the same methods using different inputs, right? And this is true either for kana or romaji transliteration.
Romaji:
- transliterateSokuon
- transliterateChoonpu
- convertLongVowels
- convertParticles
Kana:
- prepareTransliteration
- transliterateSokuon
- transliterateQuotationMarks
Why don't we just use generic Romaji and Kana classes and populate those methods with inputs coming from configuration files (like YAML?) We will have the following files:
- Romaji.php
- Kana.php
- Hepburn.yml
- Kunrei.yml
- Nihon.yml
- Wapuro.yml
- Hiragana.yml
- Katakana.yml
... maybe put all those YAML files into some subfolders.
Yes indeed that sound good to me :smile:
I think we got a well designed component here which can be easily customized and overridden to define its own transliteration system. I close the issue, feel free to fill in new issue/enhance regarding this code.
It could be useful to have a way to specify both how the romanization should be handled as well as how to convert long vowels.
The romanization style would only dictate how to covert "direct" sounds : Hepburn (shi/tsu/ja) vs Kunrei (si/tu/zya).
Long vowels style could be : macron, circumflex, nothing, double, "h", none as in : Tōkyō, Tôkyô, Tokyo, Tookyoo, Tohkyoh, Toukyou.
This tweaks a bit the romanization but since there is no practical "standard", that could cover more corner case.