taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.12k stars 112 forks source link

text should only return human readable text #198

Open ucarbehlul opened 2 years ago

ucarbehlul commented 2 years ago

I notice that HTMLElement.text returns text content of script and style tags too. Expected behavior of it is to not include those, as innerText should return only human readable content.

On the other hand textContent can return all text content, even if not human readable.

xileftenurb commented 1 year ago

I had the same issue, If your use case make that you never need the content of script and style tag, the parser have the options to ignore those tag from the start ->

HTMLParser.parse(text, {
  comment: false,
  blockTextElements: {
    noscript: false,
    script: false,
    style: false,
    pre: false,
  },
})