msikma / pokesprite

Database project of box and inventory sprites from the Pokémon core series games
https://msikma.github.io/pokesprite/
MIT License
958 stars 164 forks source link

Add direct romaji conversions of Japanese names #89

Closed msikma closed 3 years ago

msikma commented 3 years ago

This should solve the problems raised in #85 and #88.

lacolaco commented 3 years ago

Thank you for your nice work!

This change, from ā to aa, is not good in some cases. For example, アーマーガア (Corviknight) becomes "aamaagaa" with the rule but it cannot determine the original name because "アアマアガア" or "アーマーガー" also can match it. The same problem also happens on ホーホー (Hoohoo) and many pokemons.

msikma commented 3 years ago

This change, from ā to aa, is not good in some cases. For example, アーマーガア (Corviknight) becomes "aamaagaa" with the rule but it cannot determine the original name because "アアマアガア" or "アーマーガー" also can match it. The same problem also happens on ホーホー (Hoohoo) and many pokemons.

You're absolutely right. The reason I implemented it this way is because, normally, with long vowels, you can't tell the difference either...アーマーガア becomes āmāgā in Hepburn romanization, right? I just doubled the vowels to make it easier to type. I've always been taught that romaji is not lossless by nature. Like ō being used for おう and おお. So I believed this would be the most "standard" way to do it.

Of course, we could change the long vowel markers to dashes. Although I feel either way we go the frontend will need to be lenient in matching. Some people will type 'aamaagaa' and some will type 'a-ma-gaa', and I feel both versions should be matched, right?

I'm actually a little surprised to learn that people typing dashes is so common. If you don't mind my asking, do native speakers do this when typing romaji? This is a nice learning opportunity for me because I've actually studied Japanese in Tokyo for over a year (and I hope to be back as soon as it's possible) and I didn't know about this. 🙂

What about changing it to this?

And then frontends can decide for themselves whether to display/match it as 'a-ma-gaa' or 'aamaagaa' or 'āmāgā'.

lacolaco commented 3 years ago

I'm not a professional of Romaji but I guess "āmāgaa" is the most standard Hepburn romanization of "アーマーガア". As you say, using hyphens(dashes) is not the correct Hepburn rule. By the way, this is the official rule used for Japanese Passport. But following this rule strictly, "アーマーガア" becomes "Amagaa" and "ホウオウ" becomes "Hoo". It doesn't make sense. So I think romanization in this project has to be optimized for pokemon names.

And surely I did say ā is not easy to type, but honestly, it's not a serious matter because I've been able to convert it programmatically in my application (sorry!).

I think for name.jpn_ro, slug.jpn, slug.jpn_ro or other fields, it's better to define use-case for each one; For displaying, unique key, searching, or...? I'm not sure what is best, but at least, I could use jpn_ro even before #85 changes and there was no serious problem.

msikma commented 3 years ago

I'm not a professional of Romaji but I guess "āmāgaa" is the most standard Hepburn romanization of "アーマーガア".

This really surprised me because it's different from what I thought the case was. My always thought that アー and アア both become ā. That's why ケーキ is kēki and とうきょう is Tōkyō. That's how I almost always see words transliterated. Even the English Pokémon wiki uses "āmāgā".

I see now that, according to Wikipedia, only in loan words with katakana, duplicate vowels do not get a macron. Their example is バレエ becoming "baree" instead of "barē".

It's weird, though, because many sources do not mention this exception. Before I was using a pdf from Tokyo University on how to do romanization and even that doesn't mention it. Actually a little shocked that I never knew this for all the years I've been learning Japanese. But I also think that maybe a lot of sources don't mention this. I've searched quite a bit but can't find any reference to this rule except for Wikipedia. And the Wikipedia article is unreferenced. Really weird.

Well, either way...

I think for name.jpn_ro, slug.jpn, slug.jpn_ro or other fields, it's better to define use-case for each one; For displaying, unique key, searching, or...?

My original idea behind the slugs was that they could be used verbatim in, for example, CSS class names, or any other place that favors Latin characters. In the past this used to be the browser's address bar as well, but that's no longer the case today. In pokesprite-spritesheet (still wip) I use the English slugs to generate CSS class names, and I want to be able to do the same for Japanese and other languages too.

The jpn_ro key wasn't originally supposed to be very important, since I felt the original jpn kana version would be much more important.

lacolaco commented 3 years ago

This really surprised me because it's different from what I thought the case was. My always thought that アー and アア both become ā. That's why ケーキ is kēki and とうきょう is Tōkyō. That's how I almost always see words transliterated. Even the English Pokémon wiki uses "āmāgā".

I think it's difficult to distinguish. This case is similar to the difference between "おかあさん" and "ばあい".

lacolaco commented 3 years ago

http://www.roomazi.org/99.html This 99-style romanizaton is close to my intuition. https://ja.m.wikipedia.org/wiki/99%E5%BC%8F%E3%83%AD%E3%83%BC%E3%83%9E%E5%AD%97

lacolaco commented 3 years ago

I love this project, and I respect the maintainer's decision. And this is just my opinion.

I guess, in most use-cases jpn_ro is used as just a Japanese name with Latin character, so It's important for me that jpn_ro links to jpn as directly as possible. In most cases jpn_ro is not used for displaying because no one needs it as a displayed name. The English name, Japanese name, and Chinese name, there is each locale-native name. jpn_ro is not a native for anybody. So, I think jpn_ro's worth is just it consists of only Latin characters, which is computer-friendly, and I'm helped in my application at this point. So, the matter is how much jpn_ro links to jpn. At #85, I said "Youngoose" is not good for romanization. The reason is "Youngoose" is not linked to "ヤングース" well, unlike "Yangūsu". And honestly, Hepburn or not Hepburn-ish is not a serious problem for me. The matter is how intuitive their linking.

And the key is its amount of information. "āmāgā" lacks the information the original name is "アーマーガー" or "アアマアガア". So I feel "āmāgaa" is the best, but it's NOT a serious matter because we have the original name as jpn. There is no need to re-translate the Japanese name from jpn_ro. "āmāgā" is enough helpful. "aamaagaa" is also usable but I don't prefer it because it lacks the information of long vowels completely. As Tokyo Univ. PDF saying, "ばあい" should become "baai" instead of "bāi". "aa" not means always a long vowel. So I prefer to keep long vowels in `jpn_ro and maybe it's difficult to do only with ASCII characters. I think long vowel marks like "ā" is needed for romanization.

It's probably a little extreme way, I think it's OK to drop jpn_ro fields from the JSON completely. Users can generate it from jpn for their own use-case. Also, JSON payload size will be reduced and easier to maintain. This is an idea, but I think pokesprite doesn't have to be something like a Pokedex which includes comprehensive information.

msikma commented 3 years ago

Sorry for not having worked on this in a while. I've been super busy unfortunately. I'll get back to it this weekend most likely.

msikma commented 3 years ago

So, it's been quite a while since my last comment. Unfortunately I've been unable to work on things for a while, but I'm back now and I wanted to just let you know I'm planning on finalizing this soon.

msikma commented 3 years ago

Merging this now. If anything can be improved please let me know.