taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.11k stars 107 forks source link

Automatically fill in missing close tags #170

Closed depeiwang closed 2 years ago

depeiwang commented 2 years ago

I got a non-standard code like this:

const badHtml = 
  `  <!DOCTYPE html>
    <html>

    <body>
    <div id="abc">
        <ol>
            <li>
                <p>hello</p>
            </li>
            <li>
                <p>world</p>
            </li>
            <li>
               <!-- miss </p> here -->
                <p>
            </li>

        </ol>
    </div>
    </body>

    </html>
    `;

it's miss close tag: </p>, and when i parse this html by node-html-parser, i got a bad parse result, here is my test code:

const { parse } = require("node-html-parser");

(() => {
    const doc = parse(badHtml);
    const div = doc.querySelector("#abc");
    console.log(div === null); // output: true
    console.log(doc.toString());
})();

when i print doc.toString() to console, the result is:

    <!DOCTYPE html>
    <html>

            <li>
                <p>hello</p>
            </li>
            <li>
                <p>world</p>
            </li>

                <p>

    </p>

    </html>

the div tag is deleted. Is it possible to automatically fill in missing tags?

depeiwang commented 2 years ago

same issue: https://github.com/taoqf/node-html-parser/issues/152