miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.
https://miso-belica.github.io/sumy/
Apache License 2.0
3.52k stars 530 forks source link

Alternative sentence tokenizer #2

Open miso-belica opened 11 years ago

miso-belica commented 11 years ago

I use NLTK to tokenize text into sentences & words. But that's big package. Maybe something smaller would be better. Something like https://bitbucket.org/trebor74hr/text-sentence/overview

fahadshery commented 5 years ago

How can I use spaCy's tokeniser models? Is there a detailed documentation of the package?

mrx23dot commented 2 years ago

spaCy is worse it uses the huge tensor flow.

miso-belica commented 2 years ago

How to add support for custom language/library - https://miso-belica.github.io/sumy/how-to-add-new-language.html