Non whitespace separated languages support

jamestomasino / stutter

RSVP for browsers

https://addons.mozilla.org/en-US/firefox/addon/stutter/

GNU General Public License v3.0

134 stars 11 forks source link

Non whitespace separated languages support #91

Open c01o opened 2 years ago

c01o commented 2 years ago

Currently stutter uses /[\n\r\s]+/ as a delimiter, so languages not separated by itself, such as Japanese, are unusable. It seems google/budoux will do for at least Japanese, but since stutter finds word boundaries dynamically, it require some breaking changes.

jamestomasino commented 2 years ago

Some of this can be addressed in the locales.json, but it needs some additional attention in the Block.js file as well. I've got an open issue for Persian that will clean up most of that logic to be more flexible and I suspect Japanese will be more easily addressed then. I'll definitely need contribution help for it, though. Determining how to split based on kanji vs hiragana/katakana will be tricky.

c01o commented 2 years ago

TBH I highly doubt implementing Japanese-phrase(文節) detector will pay, and suggest use existing libraries.