taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.11k stars 107 forks source link

Add source location to nodes #126

Closed milahu closed 3 years ago

milahu commented 3 years ago

continue #107

use case: detect indent of node, so i can insert new nodes and preserve the indent level

status: tests are failing

sample code: wrap text node in a language-switch container

const nodeStart = node._source.start;
const lineStart = html.lastIndexOf('\n', nodeStart) + 1;
const indent = html.slice(lineStart, nodeStart).match(/^\s*/)[0];

node.classList.add('langs');
node.innerHTML = node.innerHTML.includes('\n')
  ? ([
      '',
      `  <div lang="en">`,
      `    ${node.innerHTML}`,
      `  </div>`,
      '',
    ].join('\n').replace(/\n/g, `\n${indent}`))
  : ([
      '',
      `  <div lang="en">${node.innerHTML}</div>`,
      '',
    ].join('\n').replace(/\n/g, `\n${indent}`))
;

small edits:

taoqf commented 3 years ago

Thanks, I did look into this and really could not understand. What is purpose with this pr? Would this slowdown the parsing and does this worth?

milahu commented 3 years ago

What is purpose with this pr?

assume we have this input

<html>
  <div class="old">
    hello
  </div>
</html>

i want to insert nodes into the <div class="old"> node but the indent should be preserved:

<html>
  <div class="old">
    <div class="new">
      hello
    </div>
  </div>
</html>

with my patch, the parser will add the source location for every node for example nodeDivOld._source.start == 20 to get the indent, i use the sample code in my first post

Would this slowdown the parsing and does this worth?

no benchmarks yet ...

this could be disabled by default, and enabled via parser options

this could be optimized with getter functions, since every position has an offset of

const dataOffset = `<${frameflag}>`.length;

which could be subtracted in the getter function

nonara commented 3 years ago

@milahu I believe #138 should solve your issue