Open nathanhammond opened 3 years ago
I understand that V8 and Gecko are considering switching to ML-based engines for some languages; @FrankYFTang and @zbraniecki can talk more about that.
Should we consider extensibility to address this limitation as a top-level concern of this API?
Could you list specific issues for such "extensibility" need to be addressed? In other words, in what aspect your NLP project would be harder/easier to implement with change to the current proposal?
Segmentation of character-based languages (without a clear textual segmentation indicator) is a research problem in natural language processing/computational linguistics. Given that, a user will always achieve better (quality, not necessarily speed) segmentation in these languages using a custom-written segmenter than if delegating to ICU's BreakIterator.
Should we consider extensibility to address this limitation as a top-level concern of this API?
For context, I have begun implementing an NLP approach for Cantonese segmentation (https://github.com/cantonese/segmenter), but it reimplements the entire proposed API of
Intl.Segmenter
.