taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.11k stars 107 forks source link

BUG: outerHTML drop <script>'s start tag #275

Open Zhang-Junzhi opened 5 months ago

Zhang-Junzhi commented 5 months ago
<!DOCTYPE html
><html lang="en"
><meta charset="UTF-8"
><title aria-live="polite">test</title
><link href="/style/app" rel="stylesheet"
><link rel="icon" href="/image/logo"><script src="/script/app" type="module"></script

></html>

Parse the above and then retrieve the document text from outerHTML. The <script> will be dropped, making the end </script> tag immediately follow <link>:

<link rel="icon" href="/image/logo"></script
taoqf commented 4 months ago

The result is caused by close tag. you may try this:

const html = `<!DOCTYPE html
><html lang="en"
><meta charset="UTF-8"
><title aria-live="polite">test</title
><link href="/style/app" rel="stylesheet"
><link rel="icon" href="/image/logo"><script src="/script/app" type="module"></script

></html>`.replace(/\n*>/g, '>');
        const root = parse(html);