mojolicious / mojo

:sparkles: Mojolicious - Perl real-time web framework
https://mojolicious.org
Artistic License 2.0
2.67k stars 580 forks source link

Mojo::DOM misparses <script> elements (another way) #2015

Closed mauke closed 1 year ago

mauke commented 1 year ago

Steps to reproduce the behavior

#!/usr/bin/env perl
use v5.12.0;
use warnings;
use Mojo::DOM;

my $dom = Mojo::DOM->new(do { local $/; scalar readline DATA });

say for $dom->find('p')->each;

__DATA__
<!DOCTYPE html>
<h1>Welcome to HTML</h1>
<script>
    console.log('this is a script element and should be executed');
// </script asdf> <p>
    console.log('this is not a script');
    // <span data-wtf="</script>">:-)</span>

Expected behavior

Output similar to:

<p>
    console.log(&#39;this is not a script&#39;);
    // <span data-wtf="&lt;/script&gt;">:-)</span>
</p>

An (implicitly closed) p element exists, so it should be found.

Actual behavior

No output.

kraih commented 1 year ago

I've not looked at the spec yet, but this would probably be the section to check for the correct behavior.

mauke commented 1 year ago

The relevant section is this one: https://html.spec.whatwg.org/multipage/parsing.html#script-data-end-tag-name-state

After seeing </ (followed by a letter) in a <script> element, we end up in the "script data end tag name" state. Here we accumulate letters into the name of a temporary tag. On seeing whitespace (space, tab, line feed, form feed), we check that the temporary tag name matches "script"; if so, we stop script parsing (treating the characters found as a script end tag) and continue parsing for attributes.

Now, end tags with attributes are technically an error: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-end-tag-with-attributes But a forgiving parser will simply ignore them.