proycon / foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
GNU General Public License v3.0
10 stars 4 forks source link

Folia2columns paragraphs #37

Closed bloemj closed 3 years ago

bloemj commented 3 years ago

In folia2columns, many of the column options did not produce meaningful results when extracting sentences or paragraphs, so I modified them to extract sequences of those annotations (i.e. lemma classes) rather than just the first instance of that annotation in the sentence/paragraph. This makes it possible to extract lemmatized sentences and paragraphs which I wanted to do. I also added an option for the text in sentences or paragraphs to be tokenized, rather than just dumping the contents of text() out.

proycon commented 3 years ago

Yes, I realized they wouldn't work for the larger units, but this is indeed much better now! Thanks! Merged.