thi-ng / umbrella

⛱ Broadly scoped ecosystem & mono-repository of 198 TypeScript projects (and ~175 examples) for general purpose, functional, data driven development
https://thi.ng
Apache License 2.0
3.31k stars 144 forks source link

FIX/packages/sax: make sure CDATA content is parsed into 'body' #479

Closed guidoschmidt closed 2 weeks ago

guidoschmidt commented 2 weeks ago

Hey hey,

Today I tried to use thi.ng/sax to parse an RSS XML feed, which eventually has CDATA inside some markup tags, e.g.

<description>
<![CDATA[ News, stories, features and analysis ]]>
</description>

Unfortunatelly the CDATA content was missing, I assume it should be parsed as body for the given element. Here's a quick example:

import * as sax from "@thi.ng/sax";
import * as tx from "@thi.ng/transducers";

const input = [`<a><d>Content</d></a>`,
               `<a><d><![CDATA[Content]]></d></a>`];

function process(input: string) {
    console.group(`Input: ${input}`);
    const doc = tx.transduce(
        sax.parse({ entities: true, children: true }),
        tx.last(),
        input,
    );

    console.log(doc!.children![0]);
    console.groupEnd();
}

tx.transduce(tx.map(process), tx.last(), input);

which leads to the following output:

Screenshot 2024-07-09 at 22 54 11
Input: <a><d>Content</d></a>
{tag: 'd', attribs: {…}, children: Array(0), body: 'Content'}

Input: <a><d><![CDATA[Content]]></d></a>
{tag: 'd', attribs: {…}, children: Array(0)} // missing "body: 'Content'" here

spun up the debugger, it looks like there was a line missing here in order to get the CDATA parsed as body.

postspectacular commented 2 weeks ago

Thank you so much @guidoschmidt!!! Very good spotting & debugging! I'll add some more test cases & then release asap! 👍

postspectacular commented 2 weeks ago

Hi @guidoschmidt - i've made some more minor changes to also support nested CDATA (just in case) and it's all released now (v2.2.0)

One more small favor for the future: Please use the Convential Commits syntax for any commits to this repo. Because you didn't, your fix/commit is not included in the generated changelog now (and I've no way of fixing it myself). There's also got a brief overview here:

https://github.com/thi-ng/umbrella/blob/develop/CONTRIBUTING.md#commit-your-changes

guidoschmidt commented 2 weeks ago

Hi @guidoschmidt - i've made some more minor changes to also support nested CDATA (just in case) and it's all released now (v2.2.0)

One more small favor for the future: Please use the Convential Commits syntax for any commits to this repo. Because you didn't, your fix/commit is not included in the generated changelog now (and I've no way of fixing it myself). There's also got a brief overview here:

https://github.com/thi-ng/umbrella/blob/develop/CONTRIBUTING.md#commit-your-changes

Ah neat, thanks for pointing it out. Will do in the future. I kind of use my own syntax for commit structure ever since, might be nice to switch to something more popular indeed 😏