URLs now allow non-ASCII characters (as discussed in #4 and fixed in f4d691a, thanks!), but a different problem has appeared -- the http:// prefix is split into separate tokens (as of current master, 5bb38a9):
$ echo 'Na adrese http://www.karaoketexty.cz/plíhal je dostupný...' | ./run_tokenizer --tokenizer czech --output vertical
Na
adrese
http
:
/
/
www.karaoketexty.cz/plíhal
je
dostupný
.
.
.
Perhaps this is in the process of being addressed, in which case don't mind me :)
URLs now allow non-ASCII characters (as discussed in #4 and fixed in f4d691a, thanks!), but a different problem has appeared -- the
http://
prefix is split into separate tokens (as of current master, 5bb38a9):Perhaps this is in the process of being addressed, in which case don't mind me :)