sloria / TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
https://textblob.readthedocs.io/
MIT License
9.08k stars 1.13k forks source link

TextBlob ngrams removes some symbols #334

Open PasaOpasen opened 4 years ago

PasaOpasen commented 4 years ago
TextBlob('c# c++ r').ngrams(2)
# [WordList(['c', 'c']), WordList(['c', 'r'])]
leo-p-labs commented 3 years ago

That is because the ngrams() function calls a Wordlist() object which itself calls the words() function. In the source we can see there is a parameter a this level called 'include_punc' setted at False by default. May be if this parameter should be accessed at the ngram() function level to keep the symbols.