wilsonzlin / minify-html

Extremely fast and smart HTML + JS + CSS minifier, available for Rust, Deno, Java, Node.js, Python, Ruby, and WASM
MIT License
842 stars 36 forks source link

Some HTML entities are incorrectly transformed to UTF8 symbols (e.g. in URLs) #169

Open samupl opened 9 months ago

samupl commented 9 months ago

When working on something I noticed (in a django app) that some URLs were rendered incorrectly.

The url in question had a query param called copy_origin. When the query param was not first (e.g. rendered as &copy_origin=something then it got transformed to the © symbol. This doesn't happen if the param is just called copy, the following underscore seems to make minify-html think it's a valid entity.

I found a few more examples.

This issue is happening at least since 0.11 up until the latest version 0.15:

❯ echo '<a href="/example?attribute=something&copy_something=1&reg_something=1&euro_something=1&yen_something=1">test</a>' | ./minhtml-0.15.0-x86_64-unknown-linux-gnu
<a href=/example?attribute=something©_something=1®_something=1&euro_something=1¥_something=1>test</a>%       
samupl commented 8 months ago

@wilsonzlin Could you verify if this is a bug, or perhaps if it's not just me making incorrect assumptions about the minification?

milen-denev commented 7 months ago

Hello there, I am leaving this link here: https://[denevcloud.azureedge.net/gumeristore/assets/js/minipopup-open.js](https://denevcloud.azureedge.net/gumeristore/assets/js/minipopup-open.js) to try out, this is not correctly minified and the return vector cannot be decoded. container UTF8 chars. Try it yourself guys.