Open EricDunsworth opened 6 years ago
Hi @EricDunsworth,
thank you for this detailed issue. This seems to be related to a unsupported encoding setting which is passed from editorconfig to readFileSync. I think lintspaces needs to check if the received encoding is a supported value for node's readFileSync(). I'm not clear about what will happen when lintspaces provides a fallback (utf-8-bom
to utf-8
) if the encoding isn't supported.
I need to dive a bit deeper into it and will fix this problem in a future release. However, if you need a faster fix, I would be very happy to accept a PR to solve this.
It seems Node itself doesn't support the "BOM" version of encodings.
Also looking at the code, it seems that this package try to read the file with the given encoding without trying to verify that the file is in this encoding. This could be achieve with ced for example. It use some C++ locally compiled source which feels bad when trying to reduce the number of dependencies to install (require node-gyp which requires some C++ compiler).
I assume having a fallback to make all the other checks is good as "BOM" flag doesn't change meaning of EOL, spaces, ending line.
According to editorconfig website, it supports : latin1, utf-8, utf-8-bom, utf-16be and utf-16le And Node seems to support : latin1, utf-8 and utf-16le
I am really not an expert of encoding. It seems safe to fallback from utf-8-bom to utf-8 and from utf-16be to utf-16le. What do you think ?
We could consider using chardet instead. This library has no dependencies and does not require native code or bindings (see here https://www.npmjs.com/package/chardet#chardet). However, @Bartheleway makes a good point that we might need to either narrow down the list of supported encodings or attempt to open the file based on the encoding setting. Here’s an example approach:
const chardet = require('chardet');
const fs = require('fs');
// Path to the file you want to check
const filePath = 'path/to/your/file.txt';
// List of supported encodings
const supportedEncodings = ['latin1', 'utf-8', 'utf-16le'];
// Detect the file encoding
const encoding = chardet.detectFileSync(filePath);
// Check if the file encoding is supported
if (!supportedEncodings.includes(encoding)) {
throw new Error('Unsupported encoding');
}
// Warn if the encoding does not match the settings
if (typeof validator.settings.encoding === 'string' && validator.settings.encoding !== encoding) {
console.warn('Encoding mismatch: setting does not match file encoding');
}
// Read the file with the detected encoding
fs.readFileSync(filePath, encoding);
I looked into chardet
and it doesn't seems to detect BOM ... also I think we should report
encoding error like package is doing for lines but I didn't find how to add an error to the whole file.
Maybe we can use jschardet
I'm not sure if we need to detect it (we could skip detection), as Node.js doesn't support it anyway. If chardet or jschardet detects an unsupported encoding, we could attempt to read the file with a warning (hint). If this fails, we would then report an error.
I didn't find how to add an error to the whole file.
This is currently not possible given the structure of our reporting. However, we could attach the hint/warning to the first line of the file as a workaround.
{
"/path/to/file.ext": {
"1": [
{
"line": 1,
"code": "ENCODING_UNSUPPORTED",
"type": "hint",
"message": "Unsupported encoding of file."
},
{
"line": 1,
"code": "ENCODING_ERROR",
"type": "warning",
"message": "Can not read file."
},
],
}
}
Note: Lintspaces currently supports two types of messages: hint
(info) and warning.
An error
level is not currently supported but can be introduced. The warning
level currently is the highest available and may be considered equivalent to an error
in other linting tools, such as ESLint.
Lintspaces doesn't claim to support .editorconfig's
charset
property, but appears to be reading it and attempting to use it in a way that can lead to exceptions.An exception can occur if the
charset
property's value is unrecognized. Out of EditorConfig's 5 supported charsets, lintspaces seems to understandlatin1
,utf-8
andutf-16le
, but errors out withutf-16be
andutf-8-bom
.The same is also true for
unset
/invalid values. Btw I didn't cover thecharset
property in #47 since its exception would've prevented other warnings from appearing.Would it be possible to either completely ignore the
charset
property or improve lintspace's handling of it?Code/Output samples... The samples below demonstrate how lintspaces currently reacts to a
charset = utf-8-bom
property..editorconfig
test.txt (BOM's presence doesn't matter)
check.js
Output when running
node check.js
in Windows:Expected output of
node check.js
: