%XX is removed from the text when XX is hexadecimal, which looks like a URL escape issue (ref. https://github.com/stanfordnlp/CoreNLP/issues/784). Passing URL-encoded string returns an expected result.
>>> [r.originalText for r in client.annotate("100%absolutely sure".lower()).sentencelessToken]
['100', 'solutely', 'sure']
>>> [r.originalText for r in client.annotate(urllib.parse.quote("100%absolutely sure".lower())).sentencelessToken]
['100', '%', 'absolutely', 'sure']
After I found this bug, I noticed this library is deprecated. You can close this issue, I just wanted to navigate people who does the same mistake. Thanks in advance!
%XX
is removed from the text whenXX
is hexadecimal, which looks like a URL escape issue (ref. https://github.com/stanfordnlp/CoreNLP/issues/784). Passing URL-encoded string returns an expected result.After I found this bug, I noticed this library is deprecated. You can close this issue, I just wanted to navigate people who does the same mistake. Thanks in advance!