usgpo / bulk-data

User Guides for XML on the govinfo Bulk Data Repository. For information about Bill Status XML Bulk Data, see https://github.com/usgpo/bill-status.
https://www.govinfo.gov/bulkdata
266 stars 97 forks source link

XSL error in billres.xsl isQuoteMustClose template with Saxon 9.9 #44

Open rhdunn opened 4 years ago

rhdunn commented 4 years ago

Hi,

When trying to use the Saxon 9.9 XSLT processor to transform a bill (e.g. https://www.govinfo.gov/content/pkg/BILLS-116hr823rh/xml/BILLS-116hr823rh.xml) using billres.xsl from https://www.govinfo.gov/bulkdata/BILLS/resources, I am getting the following error:

[XPTY0004] A sequence of more than one item is not allowed as the first argument of fn:name() (<subsection>, <subsection>) 
    at .../billres-details.xsl:18833:73

The following change fixes this issue:

-                   following-sibling::* and (name(self::*) = name(following-sibling::*)
+                   following-sibling::* and (name(self::*) = following-sibling::*/name()

Specifically, if there are more than one siblings following the element, the /name() construct applies the name function to all of those elements. This means that this is checking if any sibling has the same name as the current element.

If the intended behaviour is to only check if the first sibling has the same name as the current element then the following change would be needed:

-                   following-sibling::* and (name(self::*) = name(following-sibling::*)
+                   following-sibling::* and (name(self::*) = name(following-sibling::*[1])

Kind regards, Reece

llaplant commented 4 years ago

Thank you for letting us know about this. I’ll investigate and report back on the resolution.

rhdunn commented 4 years ago

There are other errors after fixing this one. Do you want me to report them as separate issues, add them here, or let you investigate the issues?

llaplant commented 4 years ago

If possible, could you add them here? Thank you!

rhdunn commented 4 years ago
[FORG0001] The string "(a)" cannot be cast to a boolean
    at .../billres-details.xsl:10011:141

This issue is present in 2 places -- line 10010 and line 10015. It is caused by the expression child::enum = not(''). The not('') evaluates to true(). The check should be not(child::enum = '').

rhdunn commented 4 years ago

There is a typo on line 13749 -- not(parent::toc/@chabged = 'not-changed') should be not(parent::toc/@changed = 'not-changed').

NOTE: The typo @chabged is made 6 times in the billres-details.xsl file.

rhdunn commented 4 years ago
[XPTY0004] A sequence of more than one item is not allowed as the first argument of fn:contains() ("italic", "italic") 
    at .../billres-details.xsl:13752:20

This is because ancestor::*/@reported-display-style is checking all ancestors, not just the first. If the first matching ancestor is the one that should be used, then using ancestor::*[1]/@reported-display-style should be used. If any matching ancestor is the one that should be used, then using ancestor::*[contains(@reported-display-style, 'bracket')] should be used.

NOTE: This condition will also match ancestor::amendment-block elements that don't match the previous xsl:when condition. To avoid this, you can use ancestor::*[position() = 1 and not(self::amendment-block)]/@reported-display-style or ancestor::*[contains(@reported-display-style, 'bracket') and not(self::amendment-block)] instead. -- The and not(self::amendment-block) excludes the amendment-block elements covered by the previous xsl:when clause.

NOTE: This also applies to line 8389 in the isAncestorDeleted variable.

UPDATE 1: This applies to the other isAncestorDelete1, etc. variables in that template.

UPDATE 2: This also applies to line 8605 in the expression contains(ancestor::*/@reported-display-style, 'brackets').

UPDATE 3: The lines 4882, 4900, 7956, 9656, 9717, 19836, 19845, 19891, 22071, 22074, 22098, and 22100 are also affected.

rhdunn commented 4 years ago
[XPTY0004] A sequence of more than one item is not allowed as the first argument of fn:local-name() (<title>, <title>) 
    at .../billres-details.xsl:18837:55

This is due to the local-name(parent::*/following-sibling::*) != 'after-quoted-block'. The same approach as described above can be used for fixing this issue, e.g. parent::*/following-sibling::*/local-name() != 'after-quoted-block'

rhdunn commented 4 years ago

Fixing the issues described above are enough to make Saxon 9.9 process that XML file. I'm not sure if there are any other errors with other documents. For update 3 of the fn:contains issue, I ended up performing a search to locate the lines to update, so I don't know which of those are impacted by the XML document.

rhdunn commented 4 years ago

Also note that in XSLT 2.0, the disable-output-escaping attribute on xsl:text is optional, and is not supported on all XSLT implementations. See https://www.w3.org/TR/xslt20/#err-XTRE1620.