Closed workflowsguy closed 9 years ago
Hi Guy
Thanks for your report.
I think the second print statement in your script would have to be changed to:
>>>print(", ".join(blob.words.lemmatize()))
das, Auto, sein, sehr, schön
The output you've described is not a bug but standard behaviour on Python2 (Python3 would give you the expected output).
EXPLANATION: Your second output line is the standard representation of non-ASCII characters in Python2 data structures and not a bug or botched up characters:
>>>a = [u"schön"]
>>>print(a)
[u'sch\xf6n']
# The umlaut is displayed correctly when the
# string within the list is printed:
>>>print(a[0])
schön
blob.words.lemmatize()
returns a list (consistent with textblob
main package):
>>>blob.words.lemmatize()
WordList([u'das', u'Auto', u'sein', u'sehr', u'sch\xf6n'])
>>>print(blob.words.lemmatize())
[u'das', u'Auto', u'sein', u'sehr', u'sch\xf6n']
You can either iterate over this list to get the string representations:
>>>for lemma in blob.words.lemmatize():
print(lemma)
das
Auto
sein
sehr
schön
Or you could use the following statement to print the list of lemmas as one-line string, as suggested above:
>>>print(", ".join(blob.words.lemmatize()))
das, Auto, sein, sehr, schön
Thank you for the explanation, Markus. Using textblob on Python 3 seems to be easier with regards to how "special" characters are handled (had not used Python 2 for a while and forgot about those issues).
Thanks again,
Guy
When trying the examples from the tutorial on 2.7.9, blob.words.lemmatize() incorrectly outputs "schön" as "u'sch\xf6n'.
This is the code I used:
Output:
Thanks,
Guy