shlomif / perl-XML-LibXML

The XML-LibXML CPAN Distribution for Processing XML using the libxml2 library
https://metacpan.org/release/XML-LibXML
Other
17 stars 35 forks source link

Parse HTML string crashes on invalid charset meta #50

Open evostrov opened 3 years ago

evostrov commented 3 years ago

But eval doesn't trap an error:

use XML::LibXML;

my $content = q~
<!DOCTYPE html>
<html lang="en">
<head>
<meta content="text/html; charset=UTF-8; X-Content-Type-Options=nosniff" http-equiv="Content-Type" />
</head>
<body>
</body>
</html>
~;

my $parser = XML::LibXML->new();
$parser->recover_silently(1);
my $dom;
eval {
    $dom = $parser->parse_html_string($content, {'no_network' => 1});
};
warn 'err: '.$@;
warn "parsed string: ". $dom;

Output:

[warn] err:  at (eval 260) line 28.
unknown-56d0fa0:-1: output error : unknown encoding UTF-8; X-Content-Type-Options=nosniff
[warn] Use of uninitialized value $dom in concatenation (.) or string at (eval 260) line 29.
[warn] parsed string:  at (eval 260) line 29.
shlomif commented 3 years ago

Hi!

sorry for the delay. A reworked example:

#! /usr/bin/env perl
#
# Short description for xml-libxml.pl

use strict;
use warnings;
use 5.014;
use autodie;

use Path::Tiny qw/ path tempdir tempfile cwd /;

use XML::LibXML ();

my $content = <<"EOF";
<!DOCTYPE html>
<html lang="en">
<head>
<meta content="text/html; charset=UTF-8; X-Content-Type-Options=nosniff" http-equiv="Content-Type" />
</head>
<body>
</body>
</html>
EOF

my $parser = XML::LibXML->new();
$parser->recover_silently(1);
my $dom;
eval { $dom = $parser->parse_html_string( $content, { 'no_network' => 1 } ); };
warn '{[Caught an exception]err: ' . $@ . '}';
warn "parsed string: " . ( $dom // "<undef>" );
print "did not die.\n";

Gives:

{[Caught an exception]err: } at xml-libxml.pl line 29.
unknown-5565a23959c0:-1: output error : unknown encoding UTF-8; X-Content-Type-Options=nosniff
Use of uninitialized value $dom in concatenation (.) or string at xml-libxml.pl line 30.
parsed string:  at xml-libxml.pl line 30.
did not die.

Seems like the exception was caught. $dom tricks https://perldoc.perl.org/functions/defined though.