openvanilla / McBopomofo

小麥注音輸入法
http://mcbopomofo.openvanilla.org/
MIT License
624 stars 76 forks source link

Add missing phrases for all 47 prefectures and 9 regions of Japan #437

Closed xatier closed 8 months ago

xatier commented 8 months ago

Ref:

https://www.moedict.tw/%E6%BD%9F https://www.wikiwand.com/zh-tw/%E9%83%BD%E9%81%93%E5%BA%9C%E7%B8%A3


I noticed some Japanese prefectures names are missing/inconsistent in the dictionary, I hence wrote a script to add the missing ones (for both pref. and pref. 縣). I tried my best to follow the orders and frequencies from the existing files, but I found both BPMFMappings.txt and phrase.occ are somewhat unordered.

xatier commented 8 months ago

On the ordering issue, I have this little script locally to keep my user dictionary nice and neat, I would suggest having sometime similar in this repo.

https://gist.github.com/xatier/9f34fb64884e3cbfd66f24716beb6e7a

xatier commented 8 months ago

@zonble thanks for merging the PR promptly.

@tianjianjiang

1) 潟 is correct in BPMFBase.txt [1]; however, it was incorrect in BPMFMappings.txt.

潟 ㄒㄧˋ xi4 vu4 big5
-新潟 ㄒㄧㄣ ㄒㄧㄝˋ
+新潟 ㄒㄧㄣ ㄒㄧˋ

2) I believe we only need to output Traditional Chinese characters, as McBopomofo is a Chinese input method, we don't need to output Japanese Kanji, unless we have some other use cases to justify that. My main concern was that I wasn't able to type 熊本 as a default phrase with fcitx5-mcbopomofo :)

3) I noticed BPMFMappings.txt and phrase.occ are a bit unordered by various chunks, as I mentioned earlier. Hence, I proposed having some scripts to make the ordering deterministic, possibly also integrate that into the CI.

[1] https://github.com/openvanilla/McBopomofo/blob/master/Source/Data/BPMFBase.txt#L11670

tianjianjiang commented 8 months ago

Hi @xatier,

  1. 潟 [...] was incorrect in BPMFMappings.txt. [...]

Got it! Thanks for the further info.

  1. I believe we only need to output Traditional Chinese characters, as McBopomofo is a Chinese input method, we don't need to output Japanese Kanji, unless we have some other use cases to justify that. My main concern was that I wasn't able to type 熊本 as a default phrase with fcitx5-mcbopomofo :)

I don't have any preference for Kanji, so we are on the same page, I believe. And duly noted for the case of "熊本." My main reasoning is that, while ??縣 is necessary for not forming ??現, it is probably ok to not have ??地方. But for this case I don't have strong preference, either.

  1. I noticed BPMFMappings.txt and phrase.occ are a bit unordered by various chunks, as I mentioned earlier. Hence, I proposed having some scripts to make the ordering deterministic, possibly also integrate that into the CI.

For readability, especially for diff, it sounds good to me. IMO, it's just a matter of priority. :)