Open goerlitz opened 6 years ago
I have to correct myself. The characters encoding in Python 3 will be broken again if text.encode() is removed. So this string problem seems to be caused by one of the incompatible changes in Python 3.
If found out that a different wrapper implementation uses this piece of code to fix the issue:
if sys.version_info.major >= 3:
text = text.encode('utf-8')
I'm trying to annotate some Unicode strings. But following example throws errors.
Case 1: Passing Unicode strings.
throws
because it's a string of type 'unicode' in Python 2.
Case 2: Passing encoded Unicode strings:
throws
because the string has already been encoded and cannot be encoded again.
These two lines of code in the error messages were both introduced in #6 in May 2016 to fix some Unicode issues.
However, is seems the explicit encoding in line 25 is not required anymore, because if removed case 2 works perfectly (both in Python 2 and Python 3).Note also that encoding issues were fixed in CoreNLP in October 2016 (https://github.com/stanfordnlp/CoreNLP/issues/270).