taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.12k stars 112 forks source link

rawAttributes behaviour change? #18

Closed minas90 closed 4 years ago

minas90 commented 4 years ago

On v1.2.0 rawAttributes returns values wrapped in " Also what is the reason for not encoding entities in setAttribute anymore?

taoqf commented 4 years ago

Yes, @minas90 , because of this: https://github.com/taoqf/node-html-parser/issues/17 , That's why I raise the version to 1.2 May getter attributes would do? I did see any reason to encode value in setAttribute, I've done some tests in chrome.

const div = document.createElement('div');
// undefined
div.setAttribute('a', '&<>')
// undefined
div
// <div a=​"&<>">​</div>​
div.getAttribute('a')
// "&<>"

Do you think we should encode the value? Please show me some clue.

minas90 commented 4 years ago

I see, attributes is like ~50% slower for my usecase, because it does the decoding. I will do some testing on chrome to understand when it does decode / encode, if at all.

minas90 commented 4 years ago

Yes, you are right, setAttribute doesn't encode the value in browsers. I fixed the issue in this change. Ok, it broke some things, I will try to fix that now.

minas90 commented 4 years ago

I fixed some issues and tweaked some tests to match the behaviour of browsers. Now it's passing all the tests! I also added getAttribute, so it's the same as in Chrome now.

Please review the changes and merge it.