I added an option to folia2columns to extract not just words, but also sentences and paragraphs into the column format, as this is useful for i.e. training sentence and paragraph embeddings. I have added it as a parameter -u with three options. The default is word, so if no -u parameter is specified everything should work as before. I haven't extensively tested the sentence/paragraph option with all the column options, though. Using the n option (word number relative to sentence) will yield zeroes, and hopefully the other options still work at the sentence/paragraph level (I fixed a few obvious things like the paragraph and sentence ID column options).
I added an option to folia2columns to extract not just words, but also sentences and paragraphs into the column format, as this is useful for i.e. training sentence and paragraph embeddings. I have added it as a parameter -u with three options. The default is word, so if no -u parameter is specified everything should work as before. I haven't extensively tested the sentence/paragraph option with all the column options, though. Using the n option (word number relative to sentence) will yield zeroes, and hopefully the other options still work at the sentence/paragraph level (I fixed a few obvious things like the paragraph and sentence ID column options).