timdream / jszhuyin

JS 注音:JavaScript 自動選字注音輸入法;"Smart" Chinese Zhuyin Input Method in JavaScript.
https://jszhuyin.timdream.org/
MIT License
225 stars 19 forks source link

Do not assume USC-2 length === Phonetic phone length #12

Closed timdream closed 9 years ago

timdream commented 9 years ago

There are various symbols, combine characters, and non-BMP characters that does not fall into this assumptions. They are currently filtered out so we don't break our data structure, but this should be fixed.

''' McBopomofoLineData: Skipping phrase: "𡻈" (2). The phonetics are "ㄓㄣ" (1). McBopomofoLineData: Skipping data: "¥" (1). The phonetics are "ㄖˋ-ㄩㄢˊ" (2). McBopomofoLineData: Skipping data: "©" (1). The phonetics are "ㄅㄢˇ-ㄑㄩㄢˊ" (2). McBopomofoLineData: Skipping data: "®" (1). The phonetics are "ㄓㄨˋ-ㄘㄜˋ" (2). McBopomofoLineData: Skipping data: "€" (1). The phonetics are "ㄡ-ㄩㄢˊ" (2). McBopomofoLineData: Skipping data: "℃" (1). The phonetics are "ㄕㄜˋ-ㄕˋ" (2). McBopomofoLineData: Skipping data: "℉" (1). The phonetics are "ㄏㄨㄚˊ-ㄕˋ" (2). McBopomofoLineData: Skipping data: "℡" (1). The phonetics are "ㄉㄧㄢˋ-ㄏㄨㄚˋ" (2). McBopomofoLineData: Skipping data: "™" (1). The phonetics are "ㄕㄤ-ㄅㄧㄠ" (2). McBopomofoLineData: Skipping data: "∞" (1). The phonetics are "ㄨˊ-ㄒㄧㄢˋ" (2). McBopomofoLineData: Skipping data: "∫" (1). The phonetics are "ㄐㄧ-ㄈㄣ" (2). McBopomofoLineData: Skipping data: "∴" (1). The phonetics are "ㄙㄨㄛˇ-ㄧˇ" (2). McBopomofoLineData: Skipping data: "∵" (1). The phonetics are "ㄧㄣ-ㄨㄟˋ" (2). McBopomofoLineData: Skipping data: "≠" (1). The phonetics are "ㄅㄨˋ-ㄉㄥˇ-ㄩˊ" (3). McBopomofoLineData: Skipping data: "≤" (1). The phonetics are "ㄒㄧㄠˇ-ㄩˊ-ㄉㄥˇ-ㄩˊ" (4). McBopomofoLineData: Skipping data: "≥" (1). The phonetics are "ㄉㄚˋ-ㄩˊ-ㄉㄥˇ-ㄩˊ" (4). McBopomofoLineData: Skipping data: "⌬" (1). The phonetics are "ㄅㄣˇ-ㄏㄨㄢˊ" (2). McBopomofoLineData: Skipping data: "▲" (1). The phonetics are "ㄙㄢ-ㄐㄧㄠˇ" (2). McBopomofoLineData: Skipping data: "△" (1). The phonetics are "ㄙㄢ-ㄐㄧㄠˇ" (2). McBopomofoLineData: Skipping data: "▼" (1). The phonetics are "ㄉㄠˋ-ㄙㄢ-ㄐㄧㄠˇ" (3). McBopomofoLineData: Skipping data: "▽" (1). The phonetics are "ㄉㄠˋ-ㄙㄢ-ㄐㄧㄠˇ" (3). McBopomofoLineData: Skipping data: "☂" (1). The phonetics are "ㄩˇ-ㄙㄢˇ" (2). McBopomofoLineData: Skipping data: "☃" (1). The phonetics are "ㄒㄩㄝˇ-ㄖㄣˊ" (2). McBopomofoLineData: Skipping data: "☎" (1). The phonetics are "ㄉㄧㄢˋ-ㄏㄨㄚˋ" (2). McBopomofoLineData: Skipping data: "☏" (1). The phonetics are "ㄉㄧㄢˋ-ㄏㄨㄚˋ" (2). McBopomofoLineData: Skipping data: "☠" (1). The phonetics are "ㄧㄡˇ-ㄉㄨˊ" (2). McBopomofoLineData: Skipping data: "☢" (1). The phonetics are "ㄈㄨˊ-ㄕㄜˋ-ㄒㄧㄥˋ" (3). McBopomofoLineData: Skipping data: "☣" (1). The phonetics are "ㄕㄥ-ㄨˋ-ㄒㄧㄥˋ" (3). McBopomofoLineData: Skipping data: "☮" (1). The phonetics are "ㄏㄜˊ-ㄆㄧㄥˊ" (2). McBopomofoLineData: Skipping data: "☯" (1). The phonetics are "ㄧㄣ-ㄧㄤˊ" (2). McBopomofoLineData: Skipping data: "♀" (1). The phonetics are "ㄋㄩˇ-ㄒㄧㄥˋ" (2). McBopomofoLineData: Skipping data: "♂" (1). The phonetics are "ㄋㄢˊ-ㄒㄧㄥˋ" (2). McBopomofoLineData: Skipping data: "♈" (1). The phonetics are "ㄇㄨˇ-ㄧㄤˊ" (2). McBopomofoLineData: Skipping data: "♉" (1). The phonetics are "ㄐㄧㄣ-ㄋㄧㄡˊ" (2). McBopomofoLineData: Skipping data: "♊" (1). The phonetics are "ㄕㄨㄤ-ㄗˇ" (2). McBopomofoLineData: Skipping data: "♋" (1). The phonetics are "ㄐㄩˋ-ㄒㄧㄝˋ" (2). McBopomofoLineData: Skipping data: "♌" (1). The phonetics are "ㄕ-ㄗ˙" (2). McBopomofoLineData: Skipping data: "♍" (1). The phonetics are "ㄔㄨˋ-ㄋㄩˇ" (2). McBopomofoLineData: Skipping data: "♎" (1). The phonetics are "ㄊㄧㄢ-ㄔㄥˋ" (2). McBopomofoLineData: Skipping data: "♏" (1). The phonetics are "ㄊㄧㄢ-ㄒㄧㄝ" (2). McBopomofoLineData: Skipping data: "♐" (1). The phonetics are "ㄕㄜˋ-ㄕㄡˇ" (2). McBopomofoLineData: Skipping data: "♑" (1). The phonetics are "ㄇㄛˊ-ㄐㄧㄝˊ" (2). McBopomofoLineData: Skipping data: "♒" (1). The phonetics are "ㄕㄨㄟˇ-ㄆㄧㄥˊ" (2). McBopomofoLineData: Skipping data: "♓" (1). The phonetics are "ㄕㄨㄤ-ㄩˊ" (2). McBopomofoLineData: Skipping data: "♔" (1). The phonetics are "ㄍㄨㄛˊ-ㄨㄤˊ" (2). McBopomofoLineData: Skipping data: "♕" (1). The phonetics are "ㄏㄨㄤˊ-ㄏㄡˋ" (2). McBopomofoLineData: Skipping data: "♖" (1). The phonetics are "ㄔㄥˊ-ㄅㄠˇ" (2). McBopomofoLineData: Skipping data: "♗" (1). The phonetics are "ㄓㄨˇ-ㄐㄧㄠˋ" (2). McBopomofoLineData: Skipping data: "♘" (1). The phonetics are "ㄑㄧˊ-ㄕˋ" (2). McBopomofoLineData: Skipping data: "♙" (1). The phonetics are "ㄕˋ-ㄅㄧㄥ" (2). McBopomofoLineData: Skipping data: "♚" (1). The phonetics are "ㄍㄨㄛˊ-ㄨㄤˊ" (2). McBopomofoLineData: Skipping data: "♛" (1). The phonetics are "ㄏㄨㄤˊ-ㄏㄡˋ" (2). McBopomofoLineData: Skipping data: "♜" (1). The phonetics are "ㄔㄥˊ-ㄅㄠˇ" (2). McBopomofoLineData: Skipping data: "♝" (1). The phonetics are "ㄓㄨˇ-ㄐㄧㄠˋ" (2). McBopomofoLineData: Skipping data: "♞" (1). The phonetics are "ㄑㄧˊ-ㄕˋ" (2). McBopomofoLineData: Skipping data: "♟" (1). The phonetics are "ㄕˋ-ㄅㄧㄥ" (2). McBopomofoLineData: Skipping data: "♨" (1). The phonetics are "ㄨㄣ-ㄑㄩㄢˊ" (2). McBopomofoLineData: Skipping data: "♪" (1). The phonetics are "ㄧㄣ-ㄩㄝˋ" (2). McBopomofoLineData: Skipping data: "♭" (1). The phonetics are "ㄧㄣ-ㄩㄝˋ" (2). McBopomofoLineData: Skipping data: "♯" (1). The phonetics are "ㄧㄣ-ㄩㄝˋ" (2). McBopomofoLineData: Skipping data: "♻" (1). The phonetics are "ㄗ-ㄩㄢˊ-ㄏㄨㄟˊ-ㄕㄡ" (4). McBopomofoLineData: Skipping data: "⚛" (1). The phonetics are "ㄩㄢˊ-ㄗˇ" (2). McBopomofoLineData: Skipping data: "⚾" (1). The phonetics are "ㄅㄤˋ-ㄑㄧㄡˊ" (2). McBopomofoLineData: Skipping data: "✆" (1). The phonetics are "ㄉㄧㄢˋ-ㄏㄨㄚˋ" (2). McBopomofoLineData: Skipping data: "✈" (1). The phonetics are "ㄈㄟ-ㄐㄧ" (2). McBopomofoLineData: Skipping data: "㈱" (1). The phonetics are "ㄓㄨ-ㄕˋ-ㄏㄨㄟˋ-ㄕㄜˋ" (4). McBopomofoLineData: Skipping data: "㊑" (1). The phonetics are "ㄓㄨ-ㄕˋ-ㄏㄨㄟˋ-ㄕㄜˋ" (4). McBopomofoLineData: Skipping data: "㍿" (1). The phonetics are "ㄓㄨ-ㄕˋ-ㄏㄨㄟˋ-ㄕㄜˋ" (4). McBopomofoLineData: Skipping data: "㎍" (1). The phonetics are "ㄨㄟˊ-ㄎㄜˋ" (2). McBopomofoLineData: Skipping data: "㎎" (1). The phonetics are "ㄏㄠˊ-ㄎㄜˋ" (2). McBopomofoLineData: Skipping data: "㎏" (1). The phonetics are "ㄍㄨㄥ-ㄐㄧㄣ" (2). McBopomofoLineData: Skipping data: "㎕" (1). The phonetics are "ㄨㄟˊ-ㄕㄥ" (2). McBopomofoLineData: Skipping data: "㎖" (1). The phonetics are "ㄏㄠˊ-ㄕㄥ" (2). McBopomofoLineData: Skipping data: "㎚" (1). The phonetics are "ㄋㄞˋ-ㄇㄧˇ" (2). McBopomofoLineData: Skipping data: "㎛" (1). The phonetics are "ㄨㄟˊ-ㄇㄧˇ" (2). McBopomofoLineData: Skipping data: "㎜" (1). The phonetics are "ㄍㄨㄥ-ㄌㄧˊ" (2). McBopomofoLineData: Skipping data: "㎝" (1). The phonetics are "ㄍㄨㄥ-ㄈㄣ" (2). McBopomofoLineData: Skipping data: "㎞" (1). The phonetics are "ㄍㄨㄥ-ㄌㄧˇ" (2). McBopomofoLineData: Skipping data: "㎟" (1). The phonetics are "ㄆㄧㄥˊ-ㄈㄤ-ㄏㄠˊ-ㄇㄧˇ" (4). McBopomofoLineData: Skipping data: "㎠" (1). The phonetics are "ㄆㄧㄥˊ-ㄈㄤ-ㄍㄨㄥ-ㄈㄣ" (4). McBopomofoLineData: Skipping data: "㎡" (1). The phonetics are "ㄆㄧㄥˊ-ㄈㄤ-ㄍㄨㄥ-ㄔˇ" (4). McBopomofoLineData: Skipping data: "㎢" (1). The phonetics are "ㄆㄧㄥˊ-ㄈㄤ-ㄍㄨㄥ-ㄌㄧˇ" (4). McBopomofoLineData: Skipping data: "㎤" (1). The phonetics are "ㄌㄧˋ-ㄈㄤ-ㄍㄨㄥ-ㄈㄣ" (4). McBopomofoLineData: Skipping data: "㎥" (1). The phonetics are "ㄌㄧˋ-ㄈㄤ-ㄍㄨㄥ-ㄔˇ" (4). McBopomofoLineData: Skipping data: "㏂" (1). The phonetics are "ㄕㄤˋ-ㄨˇ" (2). McBopomofoLineData: Skipping data: "㏈" (1). The phonetics are "ㄈㄣ-ㄅㄟˋ" (2). McBopomofoLineData: Skipping data: "㏑" (1). The phonetics are "ㄗˋ-ㄖㄢˊ-ㄉㄨㄟˋ-ㄕㄨˋ" (4). McBopomofoLineData: Skipping data: "㏒" (1). The phonetics are "ㄉㄨㄟˋ-ㄙㄨˋ" (2). McBopomofoLineData: Skipping data: "㏕" (1). The phonetics are "ㄅㄞˇ-ㄨㄢˋ" (2). McBopomofoLineData: Skipping data: "㏖" (1). The phonetics are "ㄇㄛˋ-ㄦˇ" (2). McBopomofoLineData: Skipping data: "㏗" (1). The phonetics are "ㄙㄨㄢ-ㄐㄧㄢˇ-ㄉㄨˋ" (3). McBopomofoLineData: Skipping data: "㏗" (1). The phonetics are "ㄙㄨㄢ-ㄐㄧㄢˇ-ㄓˊ" (3). McBopomofoLineData: Skipping data: "㏘" (1). The phonetics are "ㄒㄧㄚˋ-ㄨˇ" (2). McBopomofoLineData: Skipping data: "嗧" (1). The phonetics are "ㄐㄧㄚ-ㄌㄨㄣˊ" (2). McBopomofoLineData: Skipping data: "瓩" (1). The phonetics are "ㄑㄧㄢ-ㄨㄚˇ" (2). McBopomofoLineData: Skipping data: "" (1). The phonetics are "ㄆㄧㄥˊ-ㄍㄨㄛˇ" (2). McBopomofoLineData: Skipping data: "£" (1). The phonetics are "ㄧㄥ-ㄅㄤˋ" (2). McBopomofoLineData: Skipping data: "🌳" (2). The phonetics are "ㄨㄤˊ-ㄐㄧㄢˋ-ㄇㄧㄣˊ" (3). McBopomofoLineData: Skipping data: "🐨" (2). The phonetics are "ㄨˊ-ㄨㄟˇ-ㄒㄩㄥˊ" (3). '''

timdream commented 9 years ago

Interestingly,

McBopomofoLineData: Skipping phrase: "𡻈" (2). The phonetics are "ㄓㄣ" (1).

Does not show up the second time I compile the data...

timdream commented 9 years ago

So it turned out I've already remove that assumption a long time ago... my memory didn't serve me right. The previous breakage is simply because the DataPack does not correctly store phonetics that is longer than the word/phrases. It have been fixed in 1978805.

That means the IME now emits Emoji!