taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.11k stars 107 forks source link

add source location to nodes #107

Closed milahu closed 3 years ago

milahu commented 3 years ago

so i can reduce a redundant list of nodes into a unique array

demo ```js // node-html-parser: `node._source.start` demo // get a unique and sorted list of parent nodes from multiple child nodes const { parse } = require('node-html-parser'); const insrc = `\

a

b

a

b

a

b

`; // expected result: #d1, #d2, #d3, #d4 const root = parse(insrc); const childSelectors = ['.a', '.b']; // use start location as unique key let parentNodes = new Map(); root.querySelectorAll(childSelectors.join(',')).map(node => { const p = node.parentNode; parentNodes.set(p._source.start, p); }); // sort by start location parentNodes = Array.from(parentNodes.entries()).sort((a, b) => a[0] - b[0]); // print for (const [start, parentNode] of parentNodes) { console.dir({ start, str_expected: parentNode.toString(), str_actual__: (insrc.slice(start, start + 20) + ' ....') }) } ```
milahu commented 3 years ago

closing, not worth the trouble. to get a unique list of parents:

const insrc = `<html>....</html>`;
const root = parse(insrc);
const childSelectors = ['.a', '.b'];
const parentNodes = [...new Set(root.querySelectorAll(childSelectors.join(',')).map(node => node.parentNode))];
console.dir(parentNodes.map(n => n.toString()));