Closed polvi closed 3 years ago
I reduced this down to this failure:
faq <<EOF
<p>
<span>something</span>
text here
</p>
EOF
Error: failed to encode as: invalid attribute key label: #text - due to attributes not being prefixed
...
Here's the upstream error in the XML library we're using: https://github.com/clbanning/mxj/blob/13245dc365b0de3547c9845087941f04817e7936/xml.go#L1125-L1131
Taking a deeper look in a bit.
This also looks like wrong behavior:
faq <<EOF
<p>
text here
<span>something</span>
</p>
EOF
<p>text here</p>
The author fixed this issue upstream.
I updated the dependency, and discovered another upstream bug by running it through the same document: https://github.com/clbanning/mxj/issues/91
Right now that specific document parses fine, can get jq expressions ran on it, but cannot be converted back into XML.
# Blocked on #91
curl -s https://www.govinfo.gov/bulkdata/CFR/2020/title-14/CFR-2020-title14-vol2.xml | ./faq
Error: failed to encode as pretty: xml.Decoder.Token() - XML syntax error on line 1: invalid character entity & (no semicolon)
# Works
curl -s https://www.govinfo.gov/bulkdata/CFR/2020/title-14/CFR-2020-title14-vol2.xml | ./faq -o json | head
{
"CFRDOC": {
"-noNamespaceSchemaLocation": "CFRMergedXML.xsd",
"-xsi": "http://www.w3.org/2001/XMLSchema-instance",
"AMDDATE": "Jan. 1, 2020",
"BMTR": {
"ALPHLIST": {
"AGENCY": [
"Administrative Conference of the United States",
"Advisory Council on Historic Preservation",
Ok both of these got fixed in fb4f6a4c352298b10c7677c9acfaa0dd78ec97d2 and 7f3a4184279af050fb1ee3ae146da716cea243f8
This works...