Skip `<template />` tags in `textContent`

node-html-parser seems to render text from <template /> tags in the textContent property of child elements:

import { parse } from 'node-html-parser';
const doc = parse('<div>Hello, <template>test</template>World!</div>');
console.log(doc.textContent); // "Hello,testWorld!"

This is contrary to browsers where <template /> elements are ignored.

const doc = new DOMParser().parseFromString('<div>Hello, <template>test</template>World!</div>', 'text/html');
console.log(doc.documentElement.textContent); // "Hello, World!"

I'm using DOMParser for the example here, but doing the same thing on this actual DOM in a browser has the same output.

Two nuances here to keep in mind:

Whitespace is preserved around a <template /> tag. Note that the correct output is Hello, World because there was a space prior to the <template />. This is also true with Hello, <template></template> World!, where all the whitespace is retained.
Declarative shadow DOM is implemented as a <template shadowrootmode="open">Hello!</template>. The behavior here is awkward, since in the browser you'd never actually observe this in the real DOM, since it would get converted into a real shadow root. Shadow roots are printed with textContent, but it's an open question whether that would be the intuitive behavior here. Personally, I think this should be interpreted as a shadow root and included in textContent, but I can see others disagreeing with me.

I'm on node-html-parser@6.1.5, which is the current latest.

taoqf / node-html-parser

Skip `<template />` tags in `textContent` #235