metanorma / mn2pdf

Metanorma XML to PDF
3 stars 3 forks source link

Acute accent is missing in PDF #231

Open Intelligent2013 opened 8 months ago

Intelligent2013 commented 8 months ago

Moved from https://github.com/metanorma/pdfa-iso-32000-2/issues/29.

Source: https://github.com/metanorma/pdfa-iso-32000-2/issues/28#issuecomment-1836104849

When PDF generates for whole document - acute accent is missing: image

When PDF for Annex D only (just for time economy for debug purposes) - it renders correctly: image

Whole document contains mathml. mn2pdf generates Apache IF XML if the source XML contains mathml (it adds the hidden math text for the 'copy-paste' feature). The IF XML contains the wrong characters sequence:

<text x="84469" y="185852" dp="1 4532 Z3" foi:struct-ref="38">ᴀ</text>

(XSL-FO contains correct sequence ᴀ́)

instead of:

<text x="84469" y="185852" dp="1 4532 Z3" foi:struct-ref="38">ᴀ́</text>

I'll find the reason of the characters conversion: Apache FOP, Xalan processor or mn2pdf.

For adoc:

Test Aacutesmall: &#x1D00;&#x0301;

Test Acircumflexsmall: &#x1D00;&#x0302;

Test Adieresissmall: &#x1D00;&#x0308;

PDF renders so: image

Apache IF XML:

<text x="0" y="185852" dp="10 Z2 -891 Z35 -88 0" foi:struct-ref="37">Test Aacutesmall: </text>
<text x="84469" y="185852" dp="1 4532 Z3" foi:struct-ref="38">ᴀ</text>
...
<text x="0" y="207052" dp="16 Z2 -891 Z19 -77 Z11 -154 Z23 -121 Z3 -99 0" foi:struct-ref="3a">Test Acircumflexsmall: </text>
<text x="108218" y="207052" dp="1 4532 Z3" foi:struct-ref="3b">̂ᴀ</text>
...
<text x="0" y="228252" dp="10 Z2 -891 Z19 -77 Z15 -154 0" foi:struct-ref="3d">Test Adieresissmall: </text>
<text x="95854" y="228252" dp="1 4532 Z3" foi:struct-ref="3e">̈ᴀ</text>

Adoc with full set of combining chars: document.zip

PDF: document.presentation.pdf

This issue occurs due the enclosing 'char x' + 'combining char y' into the element <fo:inline xml:lang="none">...</fo:inline>

        <!-- enclose sequence of 'char x' + 'combining char y' to <lang_none>xy</lang_none> -->
        <xsl:variable name="regex_combining_chars">(.[&#x300;-&#x36f;])</xsl:variable>
        <xsl:variable name="element_name_lang_none">lang_none</xsl:variable>
        <xsl:variable name="tag_element_name_lang_none_open">###<xsl:value-of select="$element_name_lang_none"/>###</xsl:variable>
        <xsl:variable name="tag_element_name_lang_none_close">###/<xsl:value-of select="$element_name_lang_none"/>###</xsl:variable>

        <xsl:template match="text()" mode="update_xml_step2">
            <xsl:variable name="text_" select="java:replaceAll(java:java.lang.String.new(.), $regex_combining_chars, concat($tag_element_name_lang_none_open,'$1',$tag_element_name_lang_none_close))"/>
            <xsl:call-template name="replace_text_tags">
                <xsl:with-param name="tag_open" select="$tag_element_name_lang_none_open"/>
                <xsl:with-param name="tag_close" select="$tag_element_name_lang_none_close"/>
                <xsl:with-param name="text" select="$text_"/>
            </xsl:call-template>
        </xsl:template>

...
    <!-- for correct rendering combining chars -->
    <xsl:template match="*[local-name() = 'lang_none']">
        <fo:inline xml:lang="none"><xsl:value-of select="."/></fo:inline>
    </xsl:template>

This workaround solution added specially for fixing the issue with combining chars position (https://issues.apache.org/jira/browse/FOP-3065)

Combining chars render with a bit horizontal shift without xml:lang="none": image

I'll find why only a few combining chars render as #.