sloria / TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
https://textblob.readthedocs.io/
MIT License
9.08k stars 1.13k forks source link

TextBlob doesn't use specified tokenizer for words property #316

Open plammens opened 4 years ago

plammens commented 4 years ago

The documentation specifies:

The words and sentences properties are helpers that use the textblob.tokenizers.WordTokenizer and textblob.tokenizers.SentenceTokenizer classes, respectively.

You can use other tokenizers, such as those provided by NLTK, by passing them into the TextBlob constructor then accessing the tokens property.

While that is true for the tokens property, currently it is not for the words property, since that always uses the default textblob.tokenizers.word_tokenize:

https://github.com/sloria/TextBlob/blob/e6cd9791ae42e37b5a2132676f9ca69340e8d8c0/textblob/blob.py#L381-L389

I don't know if this is a bug or a documentation issue; I can make a PR in either case.