proycon / foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
GNU General Public License v3.0
10 stars 4 forks source link

folia2html: XSL conversion results in extra spaces #29

Closed proycon closed 3 years ago

proycon commented 3 years ago

Reported by @pirolen:

A space appears around the html spans that hold superscripted text, which are not there in the FoLiA. E.g. see in the attached file this part

<t-str xml:id="FA-Prototyp_MWG-I-23_147-215_001.text.div1.p4.t-str.9">
            <t-style><feat class="Times New Roman" subset="font_family"/><feat class="10." subset="font_size"/><feat class="{70B504C2-AD38-496F-9F2A-B6E0061724F6}" subset="font_style"/>Aufsatz im Logos IV (1913, S.253ff.</t-style>
            <t-style><feat class="superscript" subset="font_typeface"/><feat class="Times New Roman" subset="font_family"/><feat class="10." subset="font_size"/><feat class="{70B504C2-AD38-496F-9F2A-B6E0061724F6}" subset="font_style"/>a</t-style>
            <t-style><feat class="Times New Roman" subset="font_family"/><feat class="10." subset="font_size"/><feat class="{70B504C2-AD38-496F-9F2A-B6E0061724F6}" subset="font_style"/>)</t-style>
            <t-style><feat class="superscript" subset="font_typeface"/><feat class="Times New Roman" subset="font_family"/><feat class="10." subset="font_size"/><feat class="{70B504C2-AD38-496F-9F2A-B6E0061724F6}" subset="font_style"/>1</t-style>
            <t-style><feat class="Times New Roman" subset="font_family"/><feat class="10." subset="font_size"/><feat class="{70B504C2-AD38-496F-9F2A-B6E0061724F6}" subset="font_style"/> ist die Terminologie tunlichst ver<t-hbr/></t-style>
          </t-str>

Solving this in XSL will be hard so this might need to be handled in a preprocessing step in folia2html itself. This issue relates to proycon/folia#92 , proycon/folia#88 , and LanguageMachines/foliautils#56

proycon commented 3 years ago

Even when everything is in a single line (and with xsl:output indent="no") XSL transforms it to add a space:

      <t><t-str><t-style><feat class="Times New Roman" subset="font_family"/><feat class="15." subset="font_size"/><feat class="{3C19F4A8-2234-4EE8-9373-EBFA03C5A2A4}" subset="font_style"/>Es entspricht einerseits nicht den Erwartungen der<t-hbr/></t-style></t-str><t-str><t-style><feat class="Times New Roman" subset="font_family"/><feat class="15." subset="font_size"/><feat class="{3C19F4A8-2234-4EE8-9373-EBFA03C5A2A4}" subset="font_style"/>jenigen, welche in betreff der Lage der Landarbeiter nur solche</t-style></t-str></t>
        <span class="str">
          <span class="style_none style_font_family_TimesNewRoman style_font_size_15 style_font_style_3C19F4A8-2234-4EE8-9373-EBFA03C5A2A4">Es entspricht einerseits nicht den Erwartungen der<span class="hbr">­</span></span>
        </span>
        <span class="str">
          <span class="style_none style_font_family_TimesNewRoman style_font_size_15 style_font_style_3C19F4A8-2234-4EE8-9373-EBFA03C5A2A4">jenigen, welche in betreff der Lage der Landarbeiter nur solche</span>
        </span>