Open balmas opened 9 years ago
I had a quick look at this: I could not solve the problem, but I think it has something to do with the nisi in front which is not splitted. I guess the problem lies somewhere in the Worker class...
Another example, even weirder:
Nisi pācem sine morā ab hostibus petīverimus, neque urbs neque domus ūlla stāre poterit.
crashes the server, but if I take the long i out of petīverimus it doesn't
Nisi pācem sine morā ab hostibus petīverimus, neque urbs neque domus ūlla stāre poterit.
tokenize this text: "+Noster+poēta+,+nisi+cīvis+Rōmānus+esset+,+ā+populō+nunc+cīvitāte+dōnārētur+.+" (e.g. http://services.perseids.org/llt/segtok?xml=false&shifting=false&newline_boundary=1&inline=true&text=%22+Noster+po%C4%93ta+,+nisi+c%C4%ABvis+R%C5%8Dm%C4%81nus+esset+,+%C4%81+popul%C5%8D+nunc+c%C4%ABvit%C4%81te+d%C5%8Dn%C4%81r%C4%93tur+.+%22&splitting=true)
See that nunc gets tokenized as:
<w s_n="1" n="12">nun</w><pc s_n="1" n="13">-</pc><pc s_n="1" n="14">-</pc><pc s_n="1" n="15">-</pc><w s_n="1" n="16">-c</w>