ufal / ParCzech

ParCzech is a project on compiling Czech parliamentary data into annotated corpora.
https://ufal.mff.cuni.cz/parczech
0 stars 1 forks source link

downloader: Remove end space in paragraphs #42

Closed matyaskopp closed 3 years ago

matyaskopp commented 3 years ago

endspaces in paragraphs produces empty paragraphs:

<seg xml:id="ps2013-025-02-883-023.u20.p2"> </seg>
matyaskopp commented 3 years ago

https://github.com/ufal/ParCzech/commit/9e7ee08f2258a7d0d0232150082cda11ea02f4b1

matyaskopp commented 3 years ago

Side effect: Moving notes at the end of paragraph out of <seg>