⚠️ Kadot is no longer in development, the project had two branches: 0.x and 1.x (this one).
Kadot is a high-level open-source library to easily process text documents. It relies on vector representations of documents or words in order to solve NLP tasks such as summarization, spellchecking or classification.
# How to get n-grams using kadot.
>>> from kadot.tokenizers import regex_tokenizer
>>> hello_tokens = regex_tokenizer("Kadot just lets you process a text easily.")
>>> hello_tokens.ngrams(n=2)
[('Kadot', 'just'), ('just', 'lets'), ('lets', 'you'), ('you', 'process'), ('process', 'a'), ('a', 'text'), ('text', 'easily')]
⚠️ All these new features may not yet be available on Github.
Kadot is under MIT license.
Issues and pull requests are gratefully welcome. Come help me !
I am not a native English speaker, if you see any language mistakes in this README or in the code (docstrings included), please open an issue.