webpack-contrib / mini-css-extract-plugin

Lightweight CSS extraction plugin
MIT License
4.67k stars 388 forks source link

Should output UTF-8 BOM when input contains UTF-8 BOM #821

Open sammyhk opened 3 years ago

sammyhk commented 3 years ago

Bug report

When integrate with sass-loader, it will output CSS with UTF-8 BOM (i.e. \uFEFF) at the beginning of the CSS when it is run in production mode. But this plugin seems trimmed the UTF-8 BOM in the final output CSS file. This is a reproducible sample repository (https://github.com/sammyhk/scss-utf8.git). Execute npm run sass which call sass directly produce the CSS file (./dist/in.scss.bom.min.css) with UTF-8 BOM included. Execute npm run build which involve webpack, sass-loader and this plugin produce the CSS file (./dist/in.scss.min.css) without UTF-8 BOM.

Actual Behavior

Seems this plugin trimmed the UTF-8 BOM and output CSS file without BOM.

Expected Behavior

Should preserve the UTF-8 BOM if exists and output CSS file with BOM.

How Do We Reproduce?

This is a reproducible sample repository (https://github.com/sammyhk/scss-utf8.git). Execute npm run sass which call sass directly produce the CSS file (./dist/in.scss.bom.min.css) with UTF-8 BOM included. Execute npm run build which involve webpack, sass-loader and this plugin produce the CSS file (./dist/in.scss.min.css) without UTF-8 BOM.

Please paste the results of npx webpack-cli info here, and mention other relevant information

System: OS: Windows 10 10.0.19042 CPU: (8) x64 Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz Memory: 2.59 GB / 15.79 GB Binaries: Node: 12.22.4 - C:\Program Files\nodejs\node.EXE npm: 6.14.14 - C:\Program Files\nodejs\npm.CMD Browsers: Chrome: 92.0.4515.131 Edge: Spartan (44.19041.1023.0), Chromium (92.0.902.84) Internet Explorer: 11.0.19041.1 Packages: html-webpack-plugin: ^5.3.2 => 5.3.2 webpack: ^5.51.2 => 5.51.2 webpack-cli: ^4.8.0 => 4.8.0 webpack-dev-server: ^4.1.0 => 4.1.0

alexander-akait commented 3 years ago

I think oroblem in postcss, we use it as parser and postcss remove UTF-8 BOM, in this plugin we don't touch your content

alexander-akait commented 3 years ago

Also due logic merging modules in one file using UTF-8 BOM is not safe because you will have BOM in the middle of the file, I strongly recommend do not use BOM

sammyhk commented 3 years ago

Also due logic merging modules in one file using UTF-8 BOM is not safe because you will have BOM in the middle of the file, I strongly recommend do not use BOM

This is the most frustrated part, as a webpack & sass user we do not have a choice but the BOM is generated by sass. The original SCSS file (./src/in.scss) is plain ASCII with UTF-8 escaped sequence, but sass will un-escape as UTF-8 character and add BOM to the generated output.

For bundling multiple CSS into one, according to CSS specification, either binary sequence of @charset "UTF-8"; or BOM at the beginning of the file should change the CSS charset to be UTF-8. So the bundling tools should properly handle this case (move @charset "UTF-8"; or BOM to the beginning of the final output instead leaving them in the middle of the file).

Reference: https://developer.mozilla.org/en-US/docs/Web/CSS/@charset https://www.w3.org/TR/CSS2/syndata.html#charset

alexander-akait commented 3 years ago

@charset "UTF-8"; and UTF-8 BOM are different things

alexander-akait commented 3 years ago

Honestly we have here to remove BOM in webpack, some developers want to keep them, some to remove :confused:

sammyhk commented 3 years ago

@charset "UTF-8"; and UTF-8 BOM are different things

At least for CSS it treated as the same thing, according to the specification (https://www.w3.org/TR/CSS2/syndata.html#charset) the initial bytes

will change the charset of the CSS parse to be UTF-8

For this reason, SASS itself will generate @charset "UTF-8"; or BOM based on different mode (https://github.com/sass/dart-sass/blob/main/lib/src/visitor/serialize.dart#L64-L68) NOTE: webpack development mode mapped to SASS expanded mode, webpack production mode mapped to SASS compressed mode by the sass-loader.

As a webpack user, seems SASS do its job properly as it can correctly generate the CSS file when not involving webpack. That must be a problem in webpack or webpack plugins which breaks the things.

alexander-akait commented 3 years ago

hm, yes, you are right, just interesting why do you need UTF-8 BOM? Do you have server with other encoding by default? Why do not use @charset "UTF-8";?

sammyhk commented 3 years ago

hm, yes, you are right, just interesting why do you need UTF-8 BOM? Do you have server with other encoding by default? Why do not use @charset "UTF-8";?

In SCSS even you explicitly write @charset "UTF-8"; it will be erased and SASS will depends on the input source whether it contains UTF-8 character (or UTF-8 character escaped sequence) and the output mode mentioned before to automatically inject either @charset "UTF-8"; or UTF-8 BOM... That is why I'm looking for a way to fix the output CSS as currently there is no way to fix that... For the server encoding, definitely it can return the CSS by content type text/css; charset=UTF-8 header, but it will depends on the server side. I would consider that is a workaround but not a fix as the standalone CSS is still problematic...

alexander-akait commented 3 years ago

Just clarify, do you want:

  1. if developer has @charset "UTF-8"; we keep it but output on the top (limitation, if you have multiple files with different @charset "UTF-8"; we can't merge them)
  2. If developer has UTF-8 BOM we keep it but output on the top (here again the same limitation)
sammyhk commented 3 years ago

Just clarify, do you want:

  1. if developer has @charset "UTF-8"; we keep it but output on the top (limitation, if you have multiple files with different @charset "UTF-8"; we can't merge them)
  2. If developer has UTF-8 BOM we keep it but output on the top (here again the same limitation)

That should be good enough.

alexander-akait commented 3 years ago

Related https://github.com/webpack-contrib/css-loader/issues/1212

alexander-akait commented 3 years ago

Ideally css-loader should trim @charset or UTF-8 BOM and keep it as property of modules, so we can handle it for style-loader/mini-css-extract-plugin/etc

alexander-akait commented 3 years ago

We strip UTF-8 BOM by default for all modules to avoid problems with merging and to avoid other problems (caching for example), so using UTF-8 BOM in webpack has limitation https://github.com/webpack/loader-runner/blob/master/lib/LoaderRunner.js#L11

alexander-akait commented 3 years ago

dart-sass adds @charset when you have non ascii characters in your file https://github.com/webpack-contrib/sass-loader/pull/985/ (need update to the latest version), so now we should handle it for css-loader and problems will be solved