vitejs / vite

Next generation frontend tooling. It's fast!
http://vitejs.dev
MIT License
67.3k stars 6.05k forks source link

.env files with UTF16-LE BOM are ignored #14564

Open shamrin opened 11 months ago

shamrin commented 11 months ago

Describe the bug

Vite does not correctly handle .env files with BOM. For example, it happens if the file is created with Windows echo command:

echo 'VITE_VAR=1' >> .env.development.local

With this file import.meta.env.VITE_VAR expands to undefined.

Reproduction: https://stackblitz.com/edit/vitejs-vite-m3as94?file=main.js

Instead of a silent error I would expect one of two things:

  1. Correct expansion to '1' (best)
  2. Error while trying to process .env.development.local (better)

Debugging

$ xxd .env.development.local
00000000: fffe 5600 4900 5400 4500 5f00 5600 4100  ..V.I.T.E._.V.A.
00000010: 5200 3d00 3100 0d00 0a00                 R.=.1.....

Outside of Windows, the file can be created with the Python script:

with open('.env.development.local', 'wb') as f:
    f.write(b'\xff\xfe')
    f.write('VITE_VAR=1\r\n'.encode('utf-16-le'))

Related issue

https://github.com/motdotla/dotenv/issues/445

Reproduction

https://stackblitz.com/edit/vitejs-vite-m3as94?file=main.js

Steps to reproduce

No response

System Info

System:
    OS: Linux 5.0 undefined
    CPU: (8) x64 Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
    Memory: 0 Bytes / 0 Bytes
    Shell: 1.0 - /bin/jsh
  Binaries:
    Node: 18.18.0 - /usr/local/bin/node
    Yarn: 1.22.19 - /usr/local/bin/yarn
    npm: 9.4.2 - /usr/local/bin/npm
    pnpm: 8.6.12 - /usr/local/bin/pnpm
  npmPackages:
    vite: ^4.4.8 => 4.4.11 

Used Package Manager

npm

Logs

No response

Validations

bluwy commented 11 months ago

I guess the fix is that we have to try loading the .env file with multiple tries of encodings here: https://github.com/vitejs/vite/blob/5bb13aa6b299e7352b1db40914cb19cc66a52a19/packages/vite/src/node/env.ts#L37

And that would slightly hurt startup time, so I'm not really sure about fixing this. I think they should be strictly utf-8, similar to how Vite reads any file in general, e.g. https://github.com/vitejs/vite/blob/5bb13aa6b299e7352b1db40914cb19cc66a52a19/packages/vite/src/node/plugins/loadFallback.ts#L8-L20

shamrin commented 11 months ago

@bluwy Yes, trying multiple encodings seems unreliable and potentially slow. Enforcing utf-8 seems like a good solution. However, fsp.readFile does not fail with this UTF-16 BOM file. Here is what happens:

import fsp from 'node:fs/promises'

try {
  const r = await fsp.readFile('./.env.development.local', 'utf-8')
  console.log(r)
  console.log(Array.from(r).map(b => b.charCodeAt(0)))
} catch (e) {
  console.log(e)
}

Output:

��VITE_VAR=1

[
  65533, 65533, 86,  0, 73,
      0,    84,  0, 69,  0,
     95,     0, 86,  0, 65,
      0,    82,  0, 61,  0,
     49,     0, 13,  0, 10,
      0
]

It looks like a fundamental limitation with readFile() call. It simply ignores decoding errors and replaces weirdness with U+FFFD � replacement character. Also keeps invisible '\0' characters when dealing with UTF-16.

I think it could be solved if we decode with util.TextDecoder:

import fsp from 'node:fs/promises'
import util from 'node:util'

const decoder = new util.TextDecoder('utf-8', {fatal: true})
const buffer = await fsp.readFile('./.env.development.local')
try {
  console.log(decoder.decode(buffer))
} catch (e) {
  console.log(e)
}

The above correctly fails with TypeError for the broken .env file:

TypeError: The encoded data was not valid for encoding utf-8
...
{
  code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}
bluwy commented 11 months ago

I'm still not sure if it's worth fixing, wouldn't that also affect perf? There's a lot of tools today that reads files with utf-8 and they would silently fail too, we're only fixing a portion of the issue.

shamrin commented 11 months ago

@bluwy I believe it's worth fixing.

  1. I was not the first one who stumbled upon Windows echo command default encoding. Examples: dotenv, pipenv, phpdotenv.
  2. It's not the first encoding-related problem that was fixed in Vite: https://github.com/vitejs/vite/issues?q=is%3Aissue+bom+is%3Aclosed
  3. It gives a bad impression when trying Vite for the first time on Windows (when the rest of the team is on Mac or Linux).

I haven't found measurable performance difference. Half of the time TextDecoder was faster:

$ npm run benchmark
 ✓ test/encoding.bench.ts (2) 1309ms
     name                              hz     min      max    mean     p75     p99    p995    p999     rme  samples
   · readFile                   32,921.34  0.0100  12.5700  0.0304  0.0250  0.1150  0.1600  1.7350  ±8.41%    16461  
   · readFile with TextDecoder  36,706.53  0.0100  11.7250  0.0272  0.0200  0.1100  0.1450  1.4550  ±8.11%    18354   fastest

 BENCH  Summary

  readFile with TextDecoder - test/encoding.bench.ts > 
    1.11x faster than readFile

https://stackblitz.com/edit/vitest-dev-vitest-oqhnfg?file=test%2Fencoding.bench.ts

sapphi-red commented 10 months ago

FWIW fs.readFile(, 'utf8') has a fast path (https://github.com/nodejs/node/pull/48658).

Even if we fixed every place in our code base, there'd be plenty of codes that we depend on that assumes the file is encoded in UTF-8. I don't think we can fix this.

shamrin commented 10 months ago

@sapphi-red nice find! However, Vite does not supply "utf8" argument when loading env files. I did a benchmark only to show there would be no measurable difference between current implementation (that silently ignores decoding errors) and a correct one (that tells users about incorrect encoding).

I would argue broken env handling on Windows is common enough to warrant a fix for this specific case (not general). I’ve posted some links in my previous comment.

I think It would be very nice if Vite would not fail silently and in a hard-to-debug manner.

How does Vite contribution flow work? Is it a good idea to get an approval from a maintainer first?