taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.12k stars 112 forks source link

removeWhitespace + outerHTML don't remove meaningless white-spaces inside start and end tags #274

Closed Zhang-Junzhi closed 6 months ago

Zhang-Junzhi commented 7 months ago

For example,

<!DOCTYPE html
><html                     lang="en"
><meta charset="UTF-8"
><title>test</title

><p>test</p

></html>

The extra white-spaces between html and lang="en" and the extra newlines inside tags aren't removed, even if I use removeWhitespace. All regular browser implementations remove these meaningless white-spaces.