taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.11k stars 107 forks source link

element.remove() silently fails if the element was added via `innerHTML` before #186

Closed iamakulov closed 2 years ago

iamakulov commented 2 years ago

Hey there,

I stumbled upon a bug with node removal:

const { parse } = require('node-html-parser');

const root = parse('<ul id="list"><li>Hello World</li></ul>');
root.querySelector('li').innerHTML = '<a href="#">Some link</a>';
root.querySelector('a').remove()
root.toString()
 // => Expected result: <ul id="list"><li></li></ul> 
 // => Actual result: <ul id="list"><li><a href="#">Some link</a></li></ul>

(executable notebook)

Why this happens

This happens because innerHTML, set_content(), replaceWith (and perhaps other methods?) fail to send a’s parentNode when adding it to children:

https://github.com/taoqf/node-html-parser/blob/b1ef785fc7ffe6b882f4fe7403c4638bada4dee5/src/nodes/html.ts#L338-L342

As a result, when remove() is invoked, querySelector('a').parentNode returns null, and nothing happens:

https://github.com/taoqf/node-html-parser/blob/b1ef785fc7ffe6b882f4fe7403c4638bada4dee5/src/nodes/html.ts#L207-L214

Workaround

A workaround for this is to use insertAdjacentHTML("beforeend", "..."). Internally, it adds elements through appendChild instead of a direct assignment, and appendChild takes care of setting parentNode:

https://github.com/taoqf/node-html-parser/blob/b1ef785fc7ffe6b882f4fe7403c4638bada4dee5/src/nodes/html.ts#L770-L774

nonara commented 2 years ago

Interesting! Good catch. Thank you for the investigation and detail.

N-index commented 2 years ago

Thanks for your analysis, the same issue occurs when passing the 'afterbegin' parameter to the `insertAdjacentHTML' method. For this case, my workaround is to get the first child element and call insertAdjacentHTML on it, then passing the 'beforebegin' parameter to the method.