issues
search
mideind
/
Tokenizer
A tokenizer for Icelandic text
Other
27
stars
6
forks
source link
Ospl
#38
Closed
Holado
closed
2 years ago
Holado
commented
2 years ago
one_sent_per_line function reverted; --original does the job instead.
Two periods normalized to one; I can't find any case where that is valid.
More than 4 punctuation marks normalized to the first one.