isaacs / sax-js

A sax style parser for JS
Other
1.09k stars 325 forks source link

Entity declarations with XML elements or other entities #260

Closed HeikoTheissen closed 1 year ago

HeikoTheissen commented 1 year ago

Issue 1

Entity declarations that involve XML elements, like in

<!DOCTYPE A [
  <!ENTITY elem '<B/>'>
]>
<A>&elem;</A>

wrongly contribute a text node, which leads to

<A>&lt;B/&gt;</A>

whereas they should contribute XML elements and lead to

<A><B/></A>

Without the patch to lib/sax.js, the new test case test/entity-elem.js fails with

not ok 12 - should be equivalent
  ---
  diff: |
    --- expected
    +++ actual
    @@ -1,1 +1,1 @@
    -"2.1"
    +"2.&attr;<B ATTR=\"&attr;.3\"/>"

With the patch, it passes.

Issue 2

Entity declarations that involve other entities, like in

<!DOCTYPE A [
  <!ENTITY attr '1'>
  <!ENTITY text '2.&attr;'>
]>
<A>&text;</A>

are completely resolved, leading to

<A>2.&amp;attr;</A>

instead of

<A>2.1</A>

Without the patch to lib/sax.js, the new test case test/entity-recursive.js fails with

not ok 9 - should be equivalent
  ---
  diff: |
    --- expected
    +++ actual
    @@ -1,1 +1,1 @@
    -"2.1"
    +"2.&attr;"

With the patch, it passes.

isaacs commented 1 year ago

This is a great thing to support, and I'm frankly amazed at how small the code change is.

I'm wondering if it's a patch or breaking change, if anyone is depending on the incorrect behavior, since this library is so old and has been mostly untouched for so long. Also, I have a few ideas of some cases that might surface some edge cases, but I wouldn't be surprised if just works as it is. I'll poke at it tonight, thanks!

HeikoTheissen commented 1 year ago

To avoid a breaking change, the new behavior could be made conditional on a setting during sax instantiation:

HeikoTheissen commented 1 year ago

Thanks for merging this and updating https://www.npmjs.com/package/sax !

Do you also plan to mention the new option unparsedEntities on the README.md?

SethFalco commented 8 months ago

Hey hey! Sorry to pester, just noting that this implementation actually has a bug that renders the new option unusable in many cases. In SVGO, it's preventing us from migrating from our (unmaintained) fork of sax back to upstream.

I've already opened a pull request that explains and resolves the issue, includes tests, and I also checked by parsing the conformance tests for XML to ensure no exceptions occur. Before, with the option enabled, the conformance test suite threw an exception.

Would really appreciate it if others could take a peek when they have time.

I've also fixed a few other bugs impacting us in separate pull requests btw.