jzelinskie / faq

Format Agnostic jQ -- process various formats with libjq
Apache License 2.0
439 stars 14 forks source link

CDATA is not handled properly in XML content #72

Open wezm opened 4 years ago

wezm commented 4 years ago

faq fails with this error:

Error: failed to encode as pretty: xml.Decoder.Token() - XML syntax error on line 1: expected attribute name in element

when fed this valid XML/RSS file:

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>Read Rust - All</title>
    <description><![CDATA[All posts on Read Rust]]></description>
    <link>https://readrust.net/</link>
    <atom:link rel="self" href="https://readrust.net/all/feed.rss"/>
    <lastBuildDate>Tue, 22 Oct 2019 13:00:00 +0000</lastBuildDate>
    <item>
      <guid isPermaLink="false">79f41426-ad58-4eac-9adb-88b27c2a63ba</guid>
      <pubDate>Wed, 9 Oct 2019 00:00:00 +0000</pubDate>
      <title>How I handle errors in Rust</title>
      <link>https://blog.kiani.io/blog/how-i-handle-errors-in-rust/</link>
      <dc:creator>Ashkan Kiani</dc:creator>
      <description><![CDATA[derive_more is a crate which has many proc macros, amongst which is a macro for deriving From for structs, enums, and newtypes. From is the basic mechanism for using ? ergonomically in a function which returns Result<T, Error>. Almost everything I write has the derive_more crate as a dependency, and the following pattern for handling errors.]]></description>
    </item>
  </channel>
</rss>

The error seems to be caused by not handling the angle brackets in Result<T, Error> inside the CDATA.

jzelinskie commented 4 years ago

TDB If this is just unsupported behavior in our XML dependency or if there's a workaround: https://github.com/clbanning/mxj/blob/5042d4507dd4c1aa8a76aab57d05c07142b75e5c/xmlseq.go#L65