proycon / foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
GNU General Public License v3.0
10 stars 4 forks source link

folia2txt on soft hyphens #54

Open pirolen opened 1 year ago

pirolen commented 1 year ago

I have the impression that folia2txt silently removes soft hyphens. In forliautils, FoLiA-2text has an option --restore-formatting which reproduces them. Both folia2txt and FoLiA-2text keep linebreaks when producing plain text from FoLiA -- I actually naively imagined these converters simply yield running text, but it's surely fine.

proycon commented 1 year ago

You mean the <t-hbr/> elements right? I don't think folia2txt exposes an option to restore them indeed. But as FoLiA-2text does, you can just use that.