usgpo / api

services to access govinfo content and metadata
https://api.govinfo.gov
Other
184 stars 58 forks source link

Which XSLT processor and configuration is congress.gov using? #166

Closed iita-atii closed 1 week ago

iita-atii commented 1 week ago

I'm trying to process bill XML files locally using the XSL files from govinfo.gov/bulkdata/BILLS/resources. Using SaxonJS 2.7, I'm getting cardinality errors like:

Required cardinality of first argument of name() is zero or one; supplied value contains 2 items

This seems to be related to handling multiple subsections. Helpful info would be:

  1. Which XSLT processor version you're using?
  2. Any specific processor configuration or parameters?

I can manually edit them myself but don't want to head in that direction as it will be hard to maintain parity with the upstream source for later updates.

I tried to look for any info regarding this in the docs but didn't find anything.

I'm new to working the XML data so could be missing something obvious.

iita-atii commented 1 week ago

I was able to fix the issue using a less strict processor (Libxml), but am still curious to know what you are using on the backend.

jonquandt commented 1 week ago

We use the standard XML transformers bundled with recent LTS versions of Java. For the purposes of web display, the standard XSLT 1.0 transforms built into browsers are used for non-USLM bills XML. USLM bills have a css file used for the display of the XML in browser

Here's an example bill that has both USLM and non-USLM xml: https://www.govinfo.gov/app/details/BILLS-118s1510enr

iita-atii commented 1 week ago

Thank you for your reply. Very happy to see the USLM work, the docs are much more approachable than trying to work through the original .dtds. Is there any place to track expected timelines and coverages for the USLM project?

Also, regarding different bill text versions for the same bill: it seems in most cases govinfo bills bulk repositories only store the most recent text version for a bill. Will the back work for USLM follow this same convention and only produce a converted version for the final text version?

llaplant commented 1 week ago

Recommend following https://github.com/usgpo/uslm. Additional bill versions will be made available in USLM in conjunction with GPO's new XML-based composition system, XPub. At this point, we do not have a timeline, but sample bill versions are available in USLM at https://github.com/usgpo/uslm/tree/main/bill-version-samples-september-2024. Also, GPO plans to make responsive HTML and XML available for bills and public laws, in conjunction with XPub. Samples are available at https://github.com/usgpo/xpub.

iita-atii commented 1 week ago

Got it, thanks for your responsiveness. Excited to see what comes of the XPub project.