libwww-perl / WWW-Mechanize

Handy web browsing in a Perl object
https://metacpan.org/pod/WWW::Mechanize
Other
68 stars 52 forks source link

Respect CDATA[[ sections when parsing HTML #298

Closed Corion closed 4 years ago

Corion commented 4 years ago

This changes the HTML parser behaviour to properly respect CDATA[[ sections and to ignore link tags in Javascript code.

The old behaviour can be restored by passing undef as the "marked_sections" option when creating the WWW::Mechanize object:

my $mech = WWW::Mechanize->new(
    marked_sections => undef,
);

The patch also includes a (nasty) test file to check the old and new behaviour

See also the discussion in https://perlmonks.org/?node_id=11116478 and https://gist.github.com/haukex/fd76efa16f0b07ce6a7441d9b2265b2a for more context

coveralls commented 4 years ago

Pull Request Test Coverage Report for Build 337


Files with Coverage Reduction New Missed Lines %
lib/WWW/Mechanize.pm 41 93.95%
<!-- Total: 41 -->
Totals Coverage Status
Change from base Build 336: 0.04%
Covered Lines: 758
Relevant Lines: 804

💛 - Coveralls
oalders commented 4 years ago

@Corion would you be able to rebase this? I merged a much older PR just now that touches the same code.

Corion commented 4 years ago

No problem - the rebased version passes tests locally, let's see if it also passes tests on CI

oalders commented 4 years ago

Thanks @Corion!