tarsqi / ttk

Tarsqi Toolkit
Apache License 2.0
25 stars 10 forks source link

Last paragraph not processed #84

Closed marcverhagen closed 5 years ago

marcverhagen commented 5 years ago

In some rare circumstances the last paragraph of a text file is not processed. This happens when the input file is a DOS file and the last line ends in the windows end-of-line characters "\r\n". Adding an extra new line to the input makes the problem go away.

This problem does not occur when you have only one line:

She sleeps.\r

But it does occur if the line above is preceded by another line.

He sleeps.\r\n\r\nShe sleeps.\r\n

This problem does not occur in linux/osx.

marcverhagen commented 5 years ago

The problem is in split_paragraphs() in docmodel/docstructure_parser.py, where the last remaining text is added as a paragraph if there were no paragraphs before, so it works with only one paragraph that is not properly ended but not if there is more text before that one.

marcverhagen commented 5 years ago

Fixed in 2807e4e