polm / cutlet

Japanese to romaji converter in Python
https://polm.github.io/cutlet/
MIT License
309 stars 21 forks source link

Strange Conversion #53

Closed lobochrome closed 6 months ago

lobochrome commented 6 months ago

グロ-バルウイザ

Which is some sort of company name as returned by Yuchou Ginkou is transcribed as

grotesque - baru oui the

That is a very very strange result in my opinion. I wouldn't for the life of me get there from reading the katakana alone.

lobochrome commented 6 months ago

So the company name is "Global With Us" - but of course - I have no idea how their banking name is officially and the "su" seems to be cut off. But still - how does it lead to "grotesque - baru oui the" ???

polm commented 6 months ago

The long vowel mark is using a nonstandard character (minus sign), so it is not interpreted as part of any word. The rest of the strange conversion just follows from that.

lobochrome commented 6 months ago

Maybe I misunderstood how the tool works then - I wouldn't expect it to expand "グロ" to "grotesque" but would expect only "guro." To me, it seems rather aggressive to forward-guess this much.

lobochrome commented 6 months ago

Okay - I dug into the db and extracted the raw string. Of course it's beautiful half-width Katakana - don't we love them all:

振込 カ)グロ-バルウイザ

Maybe an edge case to consider. Maybe not.

polm commented 6 months ago

Half-width katakana has a long vowel mark which is distinct from the hyphen: グローバル can be converted to "Global" without issue.

It looks like you have a bank record. I looked into whether not using the long vowel mark is a restriction of 全銀, and while I have trouble finding an authoritative source, it does seem that 全銀 requires use of the minus symbol. In that case you can do a mass replace. Since this is not a general limitation of half-width katakana, I am not comfortable making it default behavior.

For "guro", no guessing is being done - cutlet just reflects what is in UniDic, which is based on etymology, not orthography. In this case it's certainly a bit surprising, so it's a good candidate for an exception.

polm commented 6 months ago

Closing because while this was unfortunate, I don't think there's anything to do here that wouldn't have other knock-on effects. Please feel free to followup if you feel otherwise.

lobochrome commented 6 months ago

Thanks for your hard work. Fully agreed. Still considering what to do but honestly in years this was the only really strange result so I'll let it run as is.