nicolas-raoul / jakaroma

Java library and command-line tool to transliterate Japanese kanji to romaji (Latin alphabet)
Apache License 2.0
62 stars 8 forks source link

Conversion to Romaji fails for translated non-Japanese location names #8

Open SFungDev opened 6 years ago

SFungDev commented 6 years ago

Translation fails for some locations which have names that have been translated from another language into Japanese.

For example: アボッツフォード空港 (Abbotsford Airport) アンガルシー (Anglesey) コンスタンツァ (Constanta) エスペランス空港 (Esperance Airport)

There are many other examples of this behaviour. By turning on debugging it looks like it could be an issue with the tokenizer or token features. I have very little knowledge of Japanese or how this translation works so I can't provide much more insight.

Thanks!

nicolas-raoul commented 6 years ago

Thanks for your feedback! For each one: What result are you getting, and what result did you expect? Cheers!

SFungDev commented 6 years ago

Hi! In every case I've seen where translation fails, the output is equal to the input apart from some non-Katakana characters which do get translated properly (the 空港 for airport locations). I was hoping for Romaji representations of these location names.

Here are the full debug outputs for the above examples:

./jakaroma.sh アボッツフォード空港
アボッツフォード        名詞,一般,*,*,*,*,*,*,*
空港    名詞,一般,*,*,*,*,空港,クウコウ,クーコー
Type: 一般
Type: 一般
アボッツフォード Ku-ko- 
./jakaroma.sh アンガルシー
アンガルシー    名詞,固有名詞,組織,*,*,*,*,*,*
Type: 固有名詞
アンガルシー 
./jakaroma.sh コンスタンツァ
コンスタンツァ  名詞,固有名詞,組織,*,*,*,*,*,*
Type: 固有名詞
コンスタンツァ 
./jakaroma.sh エスペランス空港 
エスペランス    名詞,一般,*,*,*,*,*,*,*
空港    名詞,一般,*,*,*,*,空港,クウコウ,クーコー
Type: 一般
Type: 一般
エスペランス Ku-ko- 

I haven't looked into performance on Hiragana much as I'm dealing exclusively with locations, which to my knowledge are usually written in either Kanji or Katakana.

Here are some examples of translations that work as I had hoped:

Abu Dhabi Airport

./jakaroma.sh アブダビ空港
アブダビ        名詞,固有名詞,地域,一般,*,*,アブダビ,アブダビ,アブダビ
空港    名詞,一般,*,*,*,*,空港,クウコウ,クーコー
Type: 固有名詞
Type: 一般
Abudabi Ku-ko- 

Mt. Fuji Shizuoka Airport

./jakaroma.sh 富士山静岡空港
富士山  名詞,固有名詞,一般,*,*,*,富士山,フジサン,フジサン
静岡    名詞,固有名詞,地域,一般,*,*,静岡,シズオカ,シズオカ
空港    名詞,一般,*,*,*,*,空港,クウコウ,クーコー
Type: 固有名詞
Type: 固有名詞
Type: 一般
Fujisan Shizuoka Ku-ko- 

Oklahoma City Airport

./jakaroma.sh オクラホマシティー空港
オクラホマ      名詞,固有名詞,地域,一般,*,*,オクラホマ,オクラホマ,オクラホマ
シティー        名詞,一般,*,*,*,*,シティー,シティー,シティー
空港    名詞,一般,*,*,*,*,空港,クウコウ,クーコー
Type: 固有名詞
Type: 一般
Type: 一般
Okurahoma Shiti- Ku-ko- 

I hope these help, I can provide more examples if needed. Cheers!

nicolas-raoul commented 6 years ago

Thanks for the examples!

Very strange that アボッツフォード空港 fails when アブダビ空港 works :-o

If you understand what is the difference, what triggers the problem, please let us know, thanks! :-)