terser / html-minifier-terser

actively maintained fork of html-minifier - minify HTML, CSS and JS code using terser - supports ES6 code
https://terser.org/html-minifier-terser
MIT License
385 stars 32 forks source link

[Bug]: Problem with emojis and special characters #187

Open codydubat opened 1 week ago

codydubat commented 1 week ago

What happened?

Thank you for your work on this tool! I found this issue while minifying an html file where it turns the degrees symbol ° and emojis (like 🔥) into some weird characters such as: ┬░ and ƒöÑ. I couldn't find any config option to disable this behaviour. Is it a bug or something intended? Thanks!

Version

7.2.0

What browsers are you seeing the problem on?

No response

Link to reproduce

No response

Relevant log output

No response

Willing to submit a PR?

None

BoGnY commented 23 hours ago

I've the same bug. I think is somethings related to the file encoding, in my case I run html-minifier-terser --collapse-whitespace --use-short-doctype --minify-css true page.html > page-min.html but the created file page-min.html has UTF-16 LE BOM encoding (and file size is x2 of the original file) instead UTF-8.. Converting the file to UTF-8 I get file size is 30% of the original (reduced about 70%) but special chars keep weird.

EDIT: problem is related to cmd/powershell.. running the same command on gitbash create an UTF-8 file

codydubat commented 23 hours ago

@BoGnY Nice discovery, didn't think about it, will do some tests to confirm