segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
MIT License
695 stars 39 forks source link

Apple Silicon / Arm support #88

Closed yenson-lau closed 1 year ago

yenson-lau commented 1 year ago

What will it take to get support on Apple Silicon / ARM? I'm happy to help out with testing if that can be useful.

bminixhofer commented 1 year ago

Hi! Sorry for being so quiet on this library. I have been working on a major revamp, expanding support to 85 languages, switching to a new training objective without labelled data, and switching the backbone to a BERT-style model.

The new version (now called wtpsplit) supports Apple Silicon out of the box (tested on my M1 Mac).

miraclebakelaser commented 1 year ago

Apple Silicon support, foreign language support, and paragraph segmentation.. you hit the nail on the head with this release. I'm excited to try it out, thank you!

bminixhofer commented 1 year ago

Thanks!! I'm not fully satisfied with the speed yet (it is currently slower than the old nnsplit models), but hope to address that in an update.