posthtml / posthtml-parser

Parse HTML/XML to PostHTMLTree
MIT License
113 stars 24 forks source link

HTML entities are being converted unexpectedly #51

Closed cossssmin closed 4 years ago

cossssmin commented 4 years ago

I've noticed that, starting with 0.5.1, HTML entities are being converted.

This is unexpected and in my case (Maizzle) breaks HTML emails.

For example, this:

<p>&copy; 2020</p>

... is output as:

<p>© 2020</p>

Here's another example:

<div>&zwnj;</div>

<div>&nbsp;</div>

... results in:

<div></div>

<div> </div>

These should stay as they are and should not be converted. Is there something users can do about it, or must it be handled in core?

cossssmin commented 4 years ago

Here's the result of using these in the parser.js test from PostHTML:

image

Scrum commented 4 years ago

related to https://github.com/fb55/htmlparser2/issues/630

cossssmin commented 4 years ago

Found the breaking change in https://github.com/fb55/htmlparser2/commit/8ac01e0ed0950883f021b063ba0507fbae52252e, decodeEntities is now true by default - disabling it works as before.

Scrum commented 4 years ago

Strange, but the documents say that by default false https://github.com/fb55/htmlparser2/wiki/Parser-options#option-decodeentities

cossssmin commented 4 years ago

Yes, they probably forgot to update it 🤷‍♂️

fb55 commented 4 years ago

Yes, they probably forgot to update it 🤷‍♂️

That's spot on, thanks for flagging :)