mentaljam / rollup-plugin-html2

Rollup plugin to inject bundled files to an HTML template
https://www.npmjs.com/package/rollup-plugin-html2
MIT License
20 stars 5 forks source link

Thou shall not parse HTML with RegEx #3

Closed moqmar closed 4 years ago

moqmar commented 4 years ago

In case you aren't aware of the meme: https://stackoverflow.com/a/1732454

Here are two fully valid HTML documents that fail with "Error: template must be an HTML or a file path" when used with this plugin:

<!doctype html>
<html>
    <!-- .* doesn't match newlines, and the doctype is in its own line -->
    <head>
        <title>Hello World</title>
        <meta charset="utf-8">
    </head>
    <body>
        <div id="app"></div>
    </body>
</html>
<!-- Hello World -->
<!doctype html>
<!-- What to do with those stupid comments around the doctype?! Aaaah the regex is falling apart -->
<html>
<title>Hello World</title>
<!-- what about leaving the end tag away? still completely valid! -->
<!-- or comments after the end tag? -->

Also, here are a few fully invalid HTML documents that still get accepted as valid HTML:

./a/completely/<html>/valid/linux/filepath</html>
<!--<html>Hello World, this neither has a doctype, nor a title (which is required).
Also, it's only an unterminated comment.</html>
<html>.png                                                                                          0000777 0001750 0001750 00000000104 13623230350 013231  0           ustar   ubuntu                        ubuntu                                                                                                                        PNG

IHDR            IDATxcd`    0/    IENDB`                                                                                                                        end.txt                                                                                             0000666 0001750 0001750 00000000010 13623230421 012664  0           ustar   ubuntu                        ubuntu                                                                                                                        </html>

The last one is a valid TAR archive (i.e. a binary file), consisting of a file named <html>.png and a text file with the content </html> (for the sake of demonstration this isn't the full binary version itself, but a copy of its visual representation - both are valid though according to the regular expression)

Sure, those are all edge cases, but the false-negatives (strings which are valid HTML but aren't) bug me a lot more.

Proposed solutions:

  1. Use two properties, template for HTML, and templatePath for files.
  2. Just look for a single < to treat the template as a HTML template, treat it as a file otherwise. This character is disallowed in filenames under Windows anyways, and it's somewhat hard to use under Linux, so I'd say it's safe to assume that the code inside is HTML.