Open makotokato opened 1 year ago
Consider doing like we did for the CodePointTrieBuilder. Rather than writing the code ourselves, we compile the ICU4C builder code into a WASM file and ship that in our repo.
@makotokato Does this block any other issues? Can you set an assignee (or "help wanted") and a milestone (or "backlog")?
@makotokato Does this block any other issues? Can you set an assignee (or "help wanted") and a milestone (or "backlog")?
Not blocker.
Now segmenter uses
char16trie
for dictionary segmenter. East Asian dictionary can remove/move to LSTM, but Chinese and Japanese still use it.Actually, current data is generated by ICU4C's tools then binary data by that tool converted to TOML file. So I guess that it is better to add generation tools for
char16trie
from ICU4C's dictionary text file.