tc39 / proposal-intl-segmenter

Unicode text segmentation for ECMAScript
https://tc39.github.io/proposal-intl-segmenter/
146 stars 16 forks source link

FYI: ICU+WASM based polyfill ongoing work #118

Open methyl opened 4 years ago

methyl commented 4 years ago

Some ongoing, but already usable work for a more complete polyfill is being done here: https://github.com/surferseo/intl-segmenter-polyfill / https://www.npmjs.com/package/intl-segmenter-polyfill. It still needs some finetuning (like achieving API parity), but the biggest chunk of work that compiles ICU to WASM module in an efficient way is done.

See https://github.com/surferseo/intl-segmenter-polyfill/blob/master/examples/node.js on how to use with node, web usage is almost identical (you need to fetch wasm instead of loading it via fs module). We already use it in production and going to actively maintain until browsers and Node get on par with the proposal (and probably even longer since we also use it in Elixir via https://github.com/tessi/wasmex).

sffc commented 4 years ago

Cool; thanks for sharing!

Out of curiosity, how big is your .wasm file?

CC @aheninger @FrankYFTang

methyl commented 4 years ago

~350KB gzipped, but from dictionary based segmenter only Thai is included