parcel-bundler / lightningcss

An extremely fast CSS parser, transformer, bundler, and minifier written in Rust.
https://lightningcss.dev
Mozilla Public License 2.0
6.49k stars 188 forks source link

UTF-8 BOM Handling #338

Open Silvenga opened 2 years ago

Silvenga commented 2 years ago

After upgrading Parcel on a project, we started seeing these errors:

#26 1.250 @parcel/transformer-css: Unexpected token AtKeyword("import")
#26 1.250
#26 1.250   /source/src/blah/css/app.css:1:2
#26 1.250   > 1 | @import url('./bootstrap.css');
#26 1.250   >   |  ^
#26 1.250     2 |
#26 1.250     3 | @import url('./materialdesignicons.css');

The change in behavior appears to stem from moving over to lightningcss.

Looking deeper, it appears the error comes from the UTF-8 BOM at the start of this file.


I'm not sure what the CSS spec says about the BOM, but I would propose improving the error message/detection around the BOM to improve the developer experience.

richardkmichael commented 1 year ago

@Silvenga

Thank you for mentioning this!

To repro, I used vim (:set bomb / :set nobomb) to toggle the BOM on a file in a tiny test project. The BOM is definitely the problem.

# index.js
import './main.css'

# main.css, this file will have a BOM added/removed
@import 'another.css';

# another.css
// any rule

(Or do below on-the-fly with parcel watch index.js.)


 $ od -t x main.css # BOM --> ..bfbbef, saved with `:set bomb`
0000000 40bfbbef 6f706d69 27207472 746f6e61
0000020 2e726568 27737363 00000a3b
0000032

 $ npx parcel build index.js
🚨 Build failed.

@parcel/transformer-css: Unexpected end of input

  /user/proj/src/main.css:2:1
    1 | @import 'another.css';
  > 2 |
  >   | ^

 $ od -t x main.css  # No BOM, saved with `:set nobomb`
0000000 706d6940 2074726f 6f6e6127 72656874
0000020 7373632e 000a3b27
0000027

 $ npx parcel build index.js
✨ Built in 402ms

../index.js     96 B    20ms
../index.css    76 B    20ms

I did briefly investigate this:

Unfortunately, the BOM is not removed by .trim() (.trim_start(), etc.), perhaps because it does not have the Unicode White_Space property. I thought this might be a quick fix. (The BOM is a zero width non-breaking space White_Space = no, second table here.)

There is a strip_bom crate. Detecting and removing the various UTF-related BOMs manually is probably a nuisance. The crate worked fine.

A goal of lightning is to be fast, and BOM handling might bring a performance hit worth bench-marking.

The presence of a BOM could be slightly unusual (if permitted, in theory), and authors might be able to simply change their source files.

peterjanes commented 1 year ago

I don't think this is the right way to handle the problem, but perhaps it might be useful to someone. (I wrote it before realizing https://github.com/parcel-bundler/lightningcss/issues/82 would prevent using it with bundle().)

const { transform } = require('lightningcss');

const res = transform({
  filename: 'test.css',
  code: Buffer.from('\ufeff:root { --foo: bar; }'),
  visitor: {
    Selector(selector) {
      if (selector[0].type === 'type' && selector[0].name === '\ufeff') {
        return selector.slice(-1);
      }
      return selector;
    }
  }
});
const code = res.code.toString();
console.log(code, /^:root/.test(code));

(For a BOM in CSS in the wild, have a look at mini.css.)

DanielVernall commented 1 year ago

I had the same problem when importing a dependency using the @import syntax. I worked around the issue by saving without the BOM:

Here's how to do it in VS Code:

VS Code shows the file encoding on the status bar in the bottom right: image

You can click the encoding and click "Save with encoding" to changed the file encoding: image

image

It's interesting that VS Code "guesses" it should be UTF-8 with BOM instead of straight UTF-8.

Parcel shows no errors on build.

It would be nice for Parcel to handle this with at least a better error message. This is fairly obscure, especially for those who hadn't heard of the UTF-8 BOM before.

Silvenga commented 1 year ago

@DanielVernall

It's interesting that VS Code "guesses" it should be UTF-8 with BOM instead of straight UTF-8.

VSCode is actually not guessing, that's the point of BOM - it's a magic byte that informs programs that the file is UTF-8. That being said, the industry at large has moved to trying to guess the encoding, Windows is the main platform that prefers BOM so that programs don't need to guess. But since a lot of programs were designed for Linux, BOM support can be problematic - causing problems like this issue.

DanielVernall commented 1 year ago

@Silvenga, yeah this makes sense. But I'm not really sure how VS Code decides the encoding to save the file as. It's not something you select on save, and creating a new file defaults to UTF-8 without the BOM when I try it, yet another 4 other files (out of 40) were also encoded with the BOM.

Silvenga commented 1 year ago

There's a couple settings that impact the default encoding - for example any editor configs in the directory tree. For this exact case, it's hard to tell.

I would expect VSCode to use no BOM by default, unless there's an existing encoding already being used.