ndw / xmlcalabash1

XML Calabash, an XProc processor
http://xmlcalabash.com/
108 stars 41 forks source link

Wrong base-uri(/*) returned with Saxon 9.7 and 9.8 under certain conditions #281

Open gimsieke opened 6 years ago

gimsieke commented 6 years ago

While many of our and our customers’ pipelines could be migrated from Calabash 1.1.15 with Saxon 9.6 to Calabash 1.1.21 with Saxon 9.8, I noticed a regression in a specific project. After hours of debugging, I managed to reproduce it with a minimal example.

The source in this example, Untitled2.xml, is

<?xml version="1.0" encoding="UTF-8"?>
<doc xml:base="file:/foo/bar.xml">
  <foo/>
</doc>

The pipeline, Untitled4.xpl, is

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" 
  xmlns:cx="http://xmlcalabash.com/ns/extensions"
  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0" name="mystep">

  <p:input port="source" primary="true"/>
  <p:output port="result" primary="true"/>

  <p:import href="http://xmlcalabash.com/extension/steps/library-1.0.xpl"/>

  <cx:message>
    <p:with-option name="message"
      select="'before:   base-uri(): ',   base-uri(),
                     ',  /*/@xml:base: ', /*/@xml:base,
                     ',  base-uri(/*): ', base-uri(/*)"/> 
  </cx:message>

  <p:xslt name="xslt">
    <p:input port="parameters">
      <p:empty/>
    </p:input>
    <p:input port="stylesheet">
      <p:document href="Untitled3.xsl"/>
    </p:input>
  </p:xslt>

  <cx:message>
    <p:with-option name="message"
      select="' after:   base-uri(): ',   base-uri(),
                     ',  /*/@xml:base: ', /*/@xml:base,
                     ',  base-uri(/*): ', base-uri(/*)"/> 
  </cx:message>

</p:declare-step>

The XSLT, Untitled3.xsl, is:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:math="http://www.w3.org/2005/xpath-functions/math"
  exclude-result-prefixes="xs math"
  version="3.0">

  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()" mode="#current"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="foo">
    <xsl:result-document href="f">
      <xsl:copy-of select="."/>
    </xsl:result-document>
  </xsl:template>

  <xsl:template match="@xml:base"/>

</xsl:stylesheet>

What happens during the transformation is that /*/@xml:base is removed, and /doc/foo is sent to the secondary port by an xsl:result-document instruction.

Invoking it with Calabash 1.1.22 with Saxon 9.8 or Calabash 1.1.19 with Saxon 9.7 like this:

java -jar xmlcalabash-1.1.22-98.jar -i source=Untitled2.xml Untitled4.xpl

gives the same incorrect results:

Message: before:   
  base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
  /*/@xml:base: file:/foo/bar.xml,
  base-uri(/*): file:/foo/bar.xml
Message:  after: 
  base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
  /*/@xml:base: ,
  base-uri(/*): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled3.xsl
<doc>

</doc>

It is incorrect because the result does not have an /*/@xml:base attribute any more and therefore base-uri(/*) should be the same as base-uri(). But base-uri(/*) is now the URI of the XSLT file. (It is not necessarily the URI of the XSLT file that contains the xsl:result-document instruction. In this example, it is, because there is only a single XSLT file.)

The correct output, produced with the Saxon-9.6 versions of XML Calabash 1.1.15 or 1.1.19, is:

Message: before:   
  base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
  /*/@xml:base: file:/foo/bar.xml,
  base-uri(/*): file:/foo/bar.xml
Message:  after:  
  base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
  /*/@xml:base: ,
  base-uri(/*): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml
<doc>

</doc>

It doesn’t matter that the attached XSLT is version 3.0, the same error occurs with 2.0.

ndw commented 6 years ago

There’s a lot of complex behavior going on here (thank you 1.0e6 for the small, focused test case), the relevant bit of code is in XSLT.java:

// Before Saxon 9.8, it was possible to simply set the base uri of the
// output document. That became impossible in Saxon 9.8, but I still
// think there might be XProc pipelines that rely on the fact that the
// base URI doesn't change when processed by XSLT. So we're doing it
// the hard way.
TreeWriter fixbase = new TreeWriter(runtime);
fixbase.startDocument(document.getBaseURI());
fixbase.addSubtree(xformed);
fixbase.endDocument();
xformed = fixbase.getResult();

For some reason, that doesn’t work for your stylesheet. Deep in the guts of the TinyTree implementation, there’s a systemIdMap with two entries in it, Untitled2.xml and Untitled3.xsl, and the second one is used.

In the course of misunderstanding the issue at first, I discovered that you can “fix” this bug by adding an explicit template for the document node to your stylesheet:

<xsl:template match="/">
  <xsl:copy>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

With that explicit copy, the systemIdMap has only a single value, Untitled2.xml.

Is that enough of a workaround for you?

(I’ll pass this along to Saxonica, but I have no idea if it’s a bug or not.)

ndw commented 6 years ago

Reported to Saxonica: https://markmail.org/thread/tsrtgohiby72v3ye

raducoravu commented 6 years ago

@ndw Maybe this issue is connected to this one: https://github.com/ndw/xmlcalabash1/issues/255

gimsieke commented 5 years ago

I tried it with Calabash 1.1.21 and Saxon PE 9.8.0.15. Still the same erroneous output.

gimsieke commented 5 years ago

Should be fixed for Saxon 9.9: https://saxonica.plan.io/issues/3956#note-10

ndw commented 5 years ago

I put together a 1.1.25 for Saxon 9.9(.1-2). Can you download it from here and let me know if it appears correct to you? (I haven't pushed it to Maven Central yet.)

gimsieke commented 5 years ago

Seems correct, at least for the example above.