proycon / foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
GNU General Public License v3.0
10 stars 4 forks source link

Folia2columns paragraphs #36

Closed bloemj closed 3 years ago

bloemj commented 3 years ago

I added an option to folia2columns to extract not just words, but also sentences and paragraphs into the column format, as this is useful for i.e. training sentence and paragraph embeddings. I have added it as a parameter -u with three options. The default is word, so if no -u parameter is specified everything should work as before. I haven't extensively tested the sentence/paragraph option with all the column options, though. Using the n option (word number relative to sentence) will yield zeroes, and hopefully the other options still work at the sentence/paragraph level (I fixed a few obvious things like the paragraph and sentence ID column options).

proycon commented 3 years ago

Thanks! Looks good! Merged!