node-red / node-red-dev-cli

Command-line tool for Node-RED Node authors
Apache License 2.0
7 stars 6 forks source link

checkNodes function has incorrect regex #15

Open TotallyInformation opened 2 years ago

TotallyInformation commented 2 years ago

checknodes.js - node-red/node-red-dev-cli - GitHub1s

/<script.+?type=['"]text\/javascript['"].*?>([\S\s]*?)<\/script>/ig

This does not match valid HTML v5 javascript script tag which may not have the type attribute set.

From MDN:

The HTML5 specification urges authors to omit the attribute rather than provide a redundant MIME type.

This results in:

    ---Validating Nodes---
Unable to parse  nodes/wiser.js
Unable to parse  nodes/wiser-listen.js

It would probably be better to have a negative check to eliminate script tags that have type="text/html" or the older red attributes.

sammachin commented 2 years ago

@hardillb I think this is something you could look at in your node parsing library

hardillb commented 2 years ago

Try this:

/<script.?(?:type=['"]text\/javascript['"])?\s?>([\S\s]*?)<\/script>/ig

It probably needs a little more work, but that should match both

<script type="text/javascript">

And

<script>

Alkarex commented 2 years ago

A regex will never be robust for such cases. A proper DOM parsing + XPath or CSS query would be better

knolleary commented 2 years ago

It is largely a question of return on investment. The regex has been good enough for a number of years. A small tweak will fix it for this new case. Replacing the code with a full DOM parsing approach is certainly doable, but will require a lot of changes to the existing code rather than one strategic tweak to the regex.

Longer term there is a plan to pull out the node parsing code into a shared library that can be reused by this tool, the flow-library and Node-RED itself - each of which have a slightly different set of requirements on what information gets pulled out. That would be the place to consider an alternative approach to the parsing.