Closed tunystom closed 10 years ago
Confirmed, the problem is in the MorphoDiTa library itself, in English tokenizer. Will fix it when I get back from vacation (18th August).
The issue can be sidestepped by manually tokenizing the input and not using English tokenizer.
Fixed by 01588cf.
New stable version 1.3 containing the fix has been released, on Github, CPAN and PyPI.
I have encountered the following issue when I tested the example code for python bindings:
The following error pops up:
The exception is raised on hitting the word
people
ending with the ’ (forward-quote). Seems that the stringpeople’s
is truncated in the middle of the multibyte UTF8 code sequence for the quote which is\xe2\x80\x99
.The taggers for Czech seem to work fine, at least they do not fail on the quotes.
I am using python2.7 and builded the code and bindings from source on Ubuntu 12.04 with proper versions of g++/swig.