proycon / foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
GNU General Public License v3.0
10 stars 4 forks source link

folia2columns - Paragraph annotation extraction fails on special double quote and a comma #39

Closed bloemj closed 3 years ago

bloemj commented 3 years ago

When using folia2columns with the new paragraph extraction mode, when text contains a combination of a special double quote and a comma, any annotations for words in a paragraph after that point are not returned. This issue is observed when extracting lemma sequences but not word sequences, so it probably only occurs when extracting annotation attributes rather than text.

bloemj commented 3 years ago

I'll have to have a look later as to whether this is due to the code I added or due to the data structure it loops through...

bloemj commented 3 years ago

Issue might be unrelated to Foliatools and due to my postprocessing after all. Will re-open if not.