Currently, (in the Dutch tokenizer) a sentence can span paragraphs if no sentence boundary was detected at the end of a paragraph, e.g. a line without final period.
In particular, the string "Dit is de kop\n\nEn een artikel. Met een tweede zin." yields the following text layer:
Currently, (in the Dutch tokenizer) a sentence can span paragraphs if no sentence boundary was detected at the end of a paragraph, e.g. a line without final period.
In particular, the string "Dit is de kop\n\nEn een artikel. Met een tweede zin." yields the following text layer:
I would think that a sentence should always end when a paragraph ends, or is there some substantive reason for keeping it like this?