goto100 / xpath

DOM 3 Xpath implemention and helper for node.js
MIT License
223 stars 71 forks source link

`"` parsing #103

Closed TheBestTvarynka closed 3 years ago

TheBestTvarynka commented 3 years ago

Hello everyone. My HTML page consists " characters. Example:

const fs = require('fs');
const xpath = require('xpath');
const dom = require('xmldom').DOMParser;
const file = "<book><title>&quot;Harry Potter&quot;</title></book>";
const doc = new dom().parseFromString(file);
const nodes = xpath.select("//title", doc);
console.log(nodes);

After parsing, as result, I get data where &quot; replaced with ". Does exist a way to left &quot; in the data? image

JLRishe commented 3 years ago

What you are describing has nothing to do this library, which only provides a way to select DOM nodes. It doesn't have any influence over their behavior, properties, or methods.

I would question why you are trying to do what you describe. Whatever it is, there is probably a better way to do it that doesn't involve dealing directly with escaped text.

It doesn't seem that xmldom maintains a copy of the original, escaped text, but if you wanted to, you could re-escape the value:

const xmlEscape = require("xml-escape")
const fs = require('fs');
const xpath = require('xpath');
const dom = require('xmldom').DOMParser;
const file = "<book><title>&quot;Harry Potter&quot;</title></book>";
const doc = new dom().parseFromString(file);
const titleNode = xpath.select1("//title", doc);

console.log(xmlEscape(titleNode.textContent));
TheBestTvarynka commented 3 years ago

Okay, thanks.