loristns / Kadot

Natural language processing using unsupervised vectors representation.
https://github.com/loristns/Kadot
107 stars 9 forks source link
natural-language-processing text-classification text-generation word-embeddings

Kadot

Natural language processing using unsupervised vectors representation.

Documentation Status Codacy Badge

⚠️ Kadot is no longer in development, the project had two branches: 0.x and 1.x (this one).

Kadot is a high-level open-source library to easily process text documents. It relies on vector representations of documents or words in order to solve NLP tasks such as summarization, spellchecking or classification.

# How to get n-grams using kadot.
>>> from kadot.tokenizers import regex_tokenizer
>>> hello_tokens = regex_tokenizer("Kadot just lets you process a text easily.")
>>> hello_tokens.ngrams(n=2)

[('Kadot', 'just'), ('just', 'lets'), ('lets', 'you'), ('you', 'process'), ('process', 'a'), ('a', 'text'), ('text', 'easily')]

What's 🆕 in 1.0 ?

⚠️ All these new features may not yet be available on Github.

⚖️ License

Kadot is under MIT license.

🚀 Contribute

Issues and pull requests are gratefully welcome. Come help me !

I am not a native English speaker, if you see any language mistakes in this README or in the code (docstrings included), please open an issue.