Closed AlenToma closed 2 years ago
This lib is not suppose to deal with incorrect html. I am so sorry for that. If you could fix this, I am happy to merge you pr.
I have checket the code and noticed something wrong in the code that couse the output to break.
Check this line
// Single error <div> <h3> </div> handle: Just removes <h3>
oneBefore.removeChild(last);
This will remove the child and its content, why dont we just close it ?
Hi again. I inspected the code above and did some test.
I do not know why you really remove the last element since you already found the none closed tags.
Anyway here is a possible solution that worked
I added parseNoneClosedTags
option
and changed the code to below
export function parse(data: string, options = { lowerCaseTagName: false, comment: false } as Partial<Options>) {
const stack = base_parse(data, options);
const [root] = stack;
while (stack.length > 1) {
// Handle each error elements.
const last = stack.pop();
const oneBefore = arr_back(stack);
if (last.parentNode && last.parentNode.parentNode) {
if (last.parentNode === oneBefore && last.tagName === oneBefore.tagName) {
// Pair error case <h3> <h3> handle : Fixes to <h3> </h3>
// this is wrong, becouse this will put the H3 outside the current right position which should be inside the current Html Element, see issue 152 for more info
if (options.parseNoneClosedTags !== true) {
oneBefore.removeChild(last);
last.childNodes.forEach((child) => {
oneBefore.parentNode.appendChild(child);
});
stack.pop();
}
} else {
// Single error <div> <h3> </div> handle: Just removes <h3>
// Why remove? this is already a HtmlElement and the missing <H3> is already added in this case. see issue 152 for more info
if (options.parseNoneClosedTags !== true) {
oneBefore.removeChild(last);
last.childNodes.forEach((child) => {
oneBefore.appendChild(child);
});
}
}
} else {
// If it's final element just skip.
}
}
return root;
}
And here is the test for this issue which passed.
const { parse } = require('@test/test-target');
describe('issue 152', function () {
it('shoud parse attributes right', function () {
const html = `<div>
<div id="chr-content">
<span>
lkjasdkjasdkljakldj
</div>
</div>`;
const expected = `<div>
<div id="chr-content">
<span>
lkjasdkjasdkljakldj
</span></div></div>`;
const root = parse(html, { parseNoneClosedTags: true });
root.toString().should.eql(expected);
// const div = root.firstChild;
// div.getAttribute('#input').should.eql('');
// div.getAttribute('(keyup)').should.eql('applyFilter($event)');
// div.getAttribute('placeholder').should.eql('Ex. IMEI');
// root.innerHTML.should.eql(html);
});
});
could you please have a look and let me know if this could work, and even better if it did then please check it in and publish it on npm so we could use it.
merged your code in v5.2.2
Please tell me if there is a library that can parse malformed html
I am unable to parse the current html from some reason
My current Code is
This is on Node.js with the latest version 4.1.4
Here is snack example I created that contain the problem if you would like to test it and see.
snack
Right now I am using
react-native-html-parser
together with this library to be able to fix the incorrect html that contains no end tags.it seems that
node-html-parser
simple ignore and rewrite the html and remove<div id="chr-content">
from some reason.Contributor's Note
Although this library was built with the known limitation of requiring proper HTML, we are looking at revising the logic in a way which will not impact performance but will be able to more reasonably handle issues of unmatched open and close tags.
This issue will be left open until that has been addressed
— @nonara