Open kloczek opened 10 months ago
The failing test can be reduced to:
$ cat /tmp/test.t
use XML::LibXML;
sub handler {
return join(",",@_);
}
my $parser = XML::LibXML->new({
expand_entities => 1,
ext_ent_handler => \&handler,
});
my $doc = $parser->parse_string(<<'EOF');
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY b SYSTEM "file:///does-not-exist">
]>
<root>&b;</root>
EOF
The output looks like a random string is parsed because the exact invalid bytes changes with each run:
$ perl -Iblib/{lib,arch} /tmp/test.t
Entity: line 1: parser error : PCDATA invalid Char value 23
g�+�U
^
Entity: line 1: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0x94 0x2B 0xDD 0x55
g�+�U
^
Entity: line 1: parser error : Char 0x0 out of allowed range
g�+�U
^
Entity: line 1: parser error : PCDATA invalid Char value 0
g�+�U
^
Entity: line 1: parser error : Char 0x0 out of allowed range
g�+�U
^
Entity: line 1: parser error : PCDATA invalid Char value 0
g�+�U
^
Entity: line 1: parser error : PCDATA invalid Char value 5
g�+�U
^
:5: parser error : Entity 'b' failed to parse
<root>&b;</root>
^
Valgrind reveals memory errors:
$ valgrind -- perl -Iblib/{lib,arch} /tmp/test.t
==10869== Memcheck, a memory error detector
==10869== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==10869== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==10869== Command: perl -Iblib/lib -Iblib/arch /tmp/test.t
==10869==
==10869== Invalid read of size 8
==10869== at 0x484F2FE: memmove (vg_replace_strmem.c:1410)
==10869== by 0x5F20916: UnknownInlinedFun (string_fortified.h:29)
==10869== by 0x5F20916: xmlMemRead (xmlIO.c:2845)
==10869== by 0x5F28006: xmlParserInputBufferGrow (xmlIO.c:3213)
==10869== by 0x5EFAE2C: xmlParserGrow (parserInternals.c:537)
==10869== by 0x5F052B4: xmlDetectEncoding (parserInternals.c:1296)
==10869== by 0x5EFDD28: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12725)
==10869== by 0x5EF2063: xmlParseReference (parser.c:7369)
==10869== by 0x5EFDA87: xmlParseContentInternal (parser.c:10268)
==10869== by 0x5EFF297: xmlParseElement (parser.c:10330)
==10869== by 0x5EFFEFF: xmlParseDocument (parser.c:11111)
==10869== by 0x5E48CBF: XS_XML__LibXML__parse_string.lto_priv.0 (LibXML.xs:1755)
==10869== by 0x49A05A9: Perl_pp_entersub (pp_hot.c:5555)
==10869== Address 0x6184420 is 0 bytes inside a block of size 25 free'd
==10869== at 0x4845B2C: free (vg_replace_malloc.c:985)
==10869== by 0x49A61DD: Perl_sv_clear (sv.c:6919)
==10869== by 0x49A3DC1: Perl_sv_free2 (sv.c:7244)
==10869== by 0x49D084E: UnknownInlinedFun (sv_inline.h:715)
==10869== by 0x49D084E: Perl_free_tmps (scope.c:255)
==10869== by 0x5E4634C: LibXML_load_external_entity (LibXML.xs:879)
==10869== by 0x5F276A6: UnknownInlinedFun (xmlIO.c:3977)
==10869== by 0x5F276A6: xmlLoadExternalEntity (xmlIO.c:3965)
==10869== by 0x5EF9FA7: xmlCreateEntityParserCtxtInternal (parser.c:13635)
==10869== by 0x5EFDC58: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12681)
==10869== by 0x5EF2063: xmlParseReference (parser.c:7369)
==10869== by 0x5EFDA87: xmlParseContentInternal (parser.c:10268)
==10869== by 0x5EFF297: xmlParseElement (parser.c:10330)
==10869== by 0x5EFFEFF: xmlParseDocument (parser.c:11111)
==10869== Block was alloc'd at
==10869== at 0x484280F: malloc (vg_replace_malloc.c:442)
==10869== by 0x496BD62: Perl_safesysmalloc (util.c:161)
==10869== by 0x499EF8F: Perl_sv_grow (sv.c:1376)
==10869== by 0x49AA63A: Perl_sv_setsv_flags (sv.c:4646)
==10869== by 0x499A17E: Perl_leave_adjust_stacks (pp_hot.c:5149)
==10869== by 0x499ED8A: Perl_pp_leavesub (pp_hot.c:5225)
==10869== by 0x49915D7: Perl_runops_standard (run.c:41)
==10869== by 0x48D75B9: Perl_call_sv (perl.c:3150)
==10869== by 0x5E4601C: LibXML_load_external_entity (LibXML.xs:856)
==10869== by 0x5F276A6: UnknownInlinedFun (xmlIO.c:3977)
==10869== by 0x5F276A6: xmlLoadExternalEntity (xmlIO.c:3965)
==10869== by 0x5EF9FA7: xmlCreateEntityParserCtxtInternal (parser.c:13635)
==10869== by 0x5EFDC58: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12681)
==10869==
==10869== Invalid read of size 2
==10869== at 0x484F350: memmove (vg_replace_strmem.c:1410)
==10869== by 0x5F20916: UnknownInlinedFun (string_fortified.h:29)
==10869== by 0x5F20916: xmlMemRead (xmlIO.c:2845)
==10869== by 0x5F28006: xmlParserInputBufferGrow (xmlIO.c:3213)
==10869== by 0x5EFAE2C: xmlParserGrow (parserInternals.c:537)
==10869== by 0x5F052B4: xmlDetectEncoding (parserInternals.c:1296)
==10869== by 0x5EFDD28: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12725)
==10869== by 0x5EF2063: xmlParseReference (parser.c:7369)
==10869== by 0x5EFDA87: xmlParseContentInternal (parser.c:10268)
==10869== by 0x5EFF297: xmlParseElement (parser.c:10330)
==10869== by 0x5EFFEFF: xmlParseDocument (parser.c:11111)
==10869== by 0x5E48CBF: XS_XML__LibXML__parse_string.lto_priv.0 (LibXML.xs:1755)
==10869== by 0x49A05A9: Perl_pp_entersub (pp_hot.c:5555)
==10869== Address 0x6184430 is 16 bytes inside a block of size 25 free'd
==10869== at 0x4845B2C: free (vg_replace_malloc.c:985)
==10869== by 0x49A61DD: Perl_sv_clear (sv.c:6919)
==10869== by 0x49A3DC1: Perl_sv_free2 (sv.c:7244)
==10869== by 0x49D084E: UnknownInlinedFun (sv_inline.h:715)
==10869== by 0x49D084E: Perl_free_tmps (scope.c:255)
==10869== by 0x5E4634C: LibXML_load_external_entity (LibXML.xs:879)
==10869== by 0x5F276A6: UnknownInlinedFun (xmlIO.c:3977)
==10869== by 0x5F276A6: xmlLoadExternalEntity (xmlIO.c:3965)
==10869== by 0x5EF9FA7: xmlCreateEntityParserCtxtInternal (parser.c:13635)
==10869== by 0x5EFDC58: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12681)
==10869== by 0x5EF2063: xmlParseReference (parser.c:7369)
==10869== by 0x5EFDA87: xmlParseContentInternal (parser.c:10268)
==10869== by 0x5EFF297: xmlParseElement (parser.c:10330)
==10869== by 0x5EFFEFF: xmlParseDocument (parser.c:11111)
==10869== Block was alloc'd at
==10869== at 0x484280F: malloc (vg_replace_malloc.c:442)
==10869== by 0x496BD62: Perl_safesysmalloc (util.c:161)
==10869== by 0x499EF8F: Perl_sv_grow (sv.c:1376)
==10869== by 0x49AA63A: Perl_sv_setsv_flags (sv.c:4646)
==10869== by 0x499A17E: Perl_leave_adjust_stacks (pp_hot.c:5149)
==10869== by 0x499ED8A: Perl_pp_leavesub (pp_hot.c:5225)
==10869== by 0x49915D7: Perl_runops_standard (run.c:41)
==10869== by 0x48D75B9: Perl_call_sv (perl.c:3150)
==10869== by 0x5E4601C: LibXML_load_external_entity (LibXML.xs:856)
==10869== by 0x5F276A6: UnknownInlinedFun (xmlIO.c:3977)
==10869== by 0x5F276A6: xmlLoadExternalEntity (xmlIO.c:3965)
==10869== by 0x5EF9FA7: xmlCreateEntityParserCtxtInternal (parser.c:13635)
==10869== by 0x5EFDC58: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12681)
==10869==
==10869== Invalid read of size 2
==10869== at 0x484F35F: memmove (vg_replace_strmem.c:1410)
==10869== by 0x5F20916: UnknownInlinedFun (string_fortified.h:29)
==10869== by 0x5F20916: xmlMemRead (xmlIO.c:2845)
==10869== by 0x5F28006: xmlParserInputBufferGrow (xmlIO.c:3213)
==10869== by 0x5EFAE2C: xmlParserGrow (parserInternals.c:537)
==10869== by 0x5F052B4: xmlDetectEncoding (parserInternals.c:1296)
==10869== by 0x5EFDD28: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12725)
==10869== by 0x5EF2063: xmlParseReference (parser.c:7369)
==10869== by 0x5EFDA87: xmlParseContentInternal (parser.c:10268)
==10869== by 0x5EFF297: xmlParseElement (parser.c:10330)
==10869== by 0x5EFFEFF: xmlParseDocument (parser.c:11111)
==10869== by 0x5E48CBF: XS_XML__LibXML__parse_string.lto_priv.0 (LibXML.xs:1755)
==10869== by 0x49A05A9: Perl_pp_entersub (pp_hot.c:5555)
==10869== Address 0x6184434 is 20 bytes inside a block of size 25 free'd
==10869== at 0x4845B2C: free (vg_replace_malloc.c:985)
==10869== by 0x49A61DD: Perl_sv_clear (sv.c:6919)
==10869== by 0x49A3DC1: Perl_sv_free2 (sv.c:7244)
==10869== by 0x49D084E: UnknownInlinedFun (sv_inline.h:715)
==10869== by 0x49D084E: Perl_free_tmps (scope.c:255)
==10869== by 0x5E4634C: LibXML_load_external_entity (LibXML.xs:879)
==10869== by 0x5F276A6: UnknownInlinedFun (xmlIO.c:3977)
==10869== by 0x5F276A6: xmlLoadExternalEntity (xmlIO.c:3965)
==10869== by 0x5EF9FA7: xmlCreateEntityParserCtxtInternal (parser.c:13635)
==10869== by 0x5EFDC58: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12681)
==10869== by 0x5EF2063: xmlParseReference (parser.c:7369)
==10869== by 0x5EFDA87: xmlParseContentInternal (parser.c:10268)
==10869== by 0x5EFF297: xmlParseElement (parser.c:10330)
==10869== by 0x5EFFEFF: xmlParseDocument (parser.c:11111)
==10869== Block was alloc'd at
==10869== at 0x484280F: malloc (vg_replace_malloc.c:442)
==10869== by 0x496BD62: Perl_safesysmalloc (util.c:161)
==10869== by 0x499EF8F: Perl_sv_grow (sv.c:1376)
==10869== by 0x49AA63A: Perl_sv_setsv_flags (sv.c:4646)
==10869== by 0x499A17E: Perl_leave_adjust_stacks (pp_hot.c:5149)
==10869== by 0x499ED8A: Perl_pp_leavesub (pp_hot.c:5225)
==10869== by 0x49915D7: Perl_runops_standard (run.c:41)
==10869== by 0x48D75B9: Perl_call_sv (perl.c:3150)
==10869== by 0x5E4601C: LibXML_load_external_entity (LibXML.xs:856)
==10869== by 0x5F276A6: UnknownInlinedFun (xmlIO.c:3977)
==10869== by 0x5F276A6: xmlLoadExternalEntity (xmlIO.c:3965)
==10869== by 0x5EF9FA7: xmlCreateEntityParserCtxtInternal (parser.c:13635)
==10869== by 0x5EFDC58: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12681)
==10869==
==10869== Invalid read of size 1
==10869== at 0x484F380: memmove (vg_replace_strmem.c:1410)
==10869== by 0x5F20916: UnknownInlinedFun (string_fortified.h:29)
==10869== by 0x5F20916: xmlMemRead (xmlIO.c:2845)
==10869== by 0x5F28006: xmlParserInputBufferGrow (xmlIO.c:3213)
==10869== by 0x5EFAE2C: xmlParserGrow (parserInternals.c:537)
==10869== by 0x5F052B4: xmlDetectEncoding (parserInternals.c:1296)
==10869== by 0x5EFDD28: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12725)
==10869== by 0x5EF2063: xmlParseReference (parser.c:7369)
==10869== by 0x5EFDA87: xmlParseContentInternal (parser.c:10268)
==10869== by 0x5EFF297: xmlParseElement (parser.c:10330)
==10869== by 0x5EFFEFF: xmlParseDocument (parser.c:11111)
==10869== by 0x5E48CBF: XS_XML__LibXML__parse_string.lto_priv.0 (LibXML.xs:1755)
==10869== by 0x49A05A9: Perl_pp_entersub (pp_hot.c:5555)
==10869== Address 0x6184436 is 22 bytes inside a block of size 25 free'd
==10869== at 0x4845B2C: free (vg_replace_malloc.c:985)
==10869== by 0x49A61DD: Perl_sv_clear (sv.c:6919)
==10869== by 0x49A3DC1: Perl_sv_free2 (sv.c:7244)
==10869== by 0x49D084E: UnknownInlinedFun (sv_inline.h:715)
==10869== by 0x49D084E: Perl_free_tmps (scope.c:255)
==10869== by 0x5E4634C: LibXML_load_external_entity (LibXML.xs:879)
==10869== by 0x5F276A6: UnknownInlinedFun (xmlIO.c:3977)
==10869== by 0x5F276A6: xmlLoadExternalEntity (xmlIO.c:3965)
==10869== by 0x5EF9FA7: xmlCreateEntityParserCtxtInternal (parser.c:13635)
==10869== by 0x5EFDC58: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12681)
==10869== by 0x5EF2063: xmlParseReference (parser.c:7369)
==10869== by 0x5EFDA87: xmlParseContentInternal (parser.c:10268)
==10869== by 0x5EFF297: xmlParseElement (parser.c:10330)
==10869== by 0x5EFFEFF: xmlParseDocument (parser.c:11111)
==10869== Block was alloc'd at
==10869== at 0x484280F: malloc (vg_replace_malloc.c:442)
==10869== by 0x496BD62: Perl_safesysmalloc (util.c:161)
==10869== by 0x499EF8F: Perl_sv_grow (sv.c:1376)
==10869== by 0x49AA63A: Perl_sv_setsv_flags (sv.c:4646)
==10869== by 0x499A17E: Perl_leave_adjust_stacks (pp_hot.c:5149)
==10869== by 0x499ED8A: Perl_pp_leavesub (pp_hot.c:5225)
==10869== by 0x49915D7: Perl_runops_standard (run.c:41)
==10869== by 0x48D75B9: Perl_call_sv (perl.c:3150)
==10869== by 0x5E4601C: LibXML_load_external_entity (LibXML.xs:856)
==10869== by 0x5F276A6: UnknownInlinedFun (xmlIO.c:3977)
==10869== by 0x5F276A6: xmlLoadExternalEntity (xmlIO.c:3965)
==10869== by 0x5EF9FA7: xmlCreateEntityParserCtxtInternal (parser.c:13635)
==10869== by 0x5EFDC58: xmlParseExternalEntityPrivate.lto_priv.0 (parser.c:12681)
==10869==
==10869==
==10869== HEAP SUMMARY:
==10869== in use at exit: 4,507,710 bytes in 16,971 blocks
==10869== total heap usage: 53,270 allocs, 36,299 frees, 10,078,767 bytes allocated
==10869==
==10869== LEAK SUMMARY:
==10869== definitely lost: 24,148 bytes in 35 blocks
==10869== indirectly lost: 56,412 bytes in 27 blocks
==10869== possibly lost: 4,408,443 bytes in 16,872 blocks
==10869== still reachable: 18,707 bytes in 37 blocks
==10869== of which reachable via heuristic:
==10869== newarray : 15,008 bytes in 465 blocks
==10869== suppressed: 0 bytes in 0 blocks
==10869== Rerun with --leak-check=full to see details of leaked memory
==10869==
==10869== For lists of detected and suppressed errors, rerun with: -s
==10869== ERROR SUMMARY: 6 errors from 4 contexts (suppressed: 0 from 0)
It could be a bug in Perl. If the sub handler looks like this:
sub handler {
my $s;
$s = join(",", $_[0], undef);
$s = 'foo';
return $s;
}
It fails:
Entity: line 1: parser error : Input is not proper UTF-8, indicate encoding !
��3
^
:5: parser error : Entity 'b' failed to parse
<root>&b;</root>
^
and valgrind reports the errors. But If you comment out the join() line:
sub handler {
my $s;
#$s = join(",", $_[0], undef);
$s = 'foo';
return $s;
}
It stops reporting the errors. The trigger is join() called on @_ and undef as the last item.
Just a s note that a string produced by join(... ,undef) does not contain NUL bytes. I verified it with Devel::Peek and with dumping the value from LibXML.xs.
I think I found it. LibXML_load_external_entity() does:
if (func != NULL && SvTRUE(*func)) {
dTHX;
dSP;
ENTER;
SAVETMPS;
PUSHMARK(SP) ;
XPUSHs(sv_2mortal(newSVpv((char*)URL, 0)));
XPUSHs(sv_2mortal(newSVpv((char*)ID, 0)));
PUTBACK;
count = call_sv(*func, G_SCALAR | G_EVAL);
SPAGAIN;
if (count == 0) {
croak("external entity handler did not return a value");
}
if (SvTRUE(ERRSV)) {
(void) POPs;
croak_obj;
}
results = POPs;
results_pv = SvPV(results, results_len);
input_buf = xmlParserInputBufferCreateMem(
→ results_pv,
results_len,
XML_CHAR_ENCODING_NONE
);
PUTBACK;
FREETMPS;
LEAVE;
return xmlNewIOInputStream(ctxt, input_buf, XML_CHAR_ENCODING_NONE);
}
A problem is that xmlParserInputBufferCreateMem() does not copy memory pointed by results_pv. It only stores the pointer and expets the memory is valid until destroying the xmlParserInputPtr. But that's not true. results_pv's memory is deallocated with PUTBACK;FREETMPS.
We need to to duplicate the string, or better make results immortal and destroy it with xmlParserInputPtr.
libxml2 maintainer here. This is a regression in libxml2 and will be fixed soon.
This should be fixed in libxml2 master here: https://gitlab.gnome.org/GNOME/libxml2/-/commit/c2bbeed1fdf500d33ecf9a07213ab6088d255357
The fix will be included in the upcoming 2.12.4 release.
I confirm that after applying https://gitlab.gnome.org/GNOME/libxml2/-/commit/c2bbeed1fdf500d33ecf9a07213ab6088d255357.patch towards libxml2-2.12.3, and then running make test
in XML-LibXML-2.0209 I see
t/43options.t ...................................... ok
t/44extent.t ....................................... ok
t/45regex.t ........................................ ok
libxml2 2.12.4 containing the fix is available now: https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.12.4
Just tested perl-XML-LibXML with new ibxml2 2.12.4 and test suite is OK now. Thank you for your time 👍 😄
Here is build result:
And test suite: