kawu / concraft-pl

A morphosyntactic tagger for Polish based on conditional random fields
http://zil.ipipan.waw.pl/Concraft
BSD 2-Clause "Simplified" License
20 stars 2 forks source link

Concraft hangs on input with non-printable characters #31

Closed kawu closed 9 years ago

kawu commented 9 years ago

Apparently Maca ignores (i.e. do not provide in the output) some special characters (e.g. '\x200b'). Since many of such characters are not spaces (notably non-printable characters), the current algorithm for reading Maca output expects more characters than given and the process freezes.