Closed evanw closed 3 years ago
Thanks, yes the API of this project could be more polished to avoid these footguns. It was actually originally written for https://github.com/guybedford/es-module-shims hence why it is entirely offset based, with a heavy focus on the overall library footprint.
I've added a new section to the readme in https://github.com/guybedford/es-module-lexer#escape-sequences, I think that should cover the concerns, but further clarifications welcome.
For this reason, when handling specifiers or identifiers it can advisable to pass the full string to
JSON.parse
:JSON.parse(source.substring(imports[i].s, imports[i].e));
That workaround had occurred to me too. However, it doesn't work for a few reasons:
It's missing the quote characters, so it won't actually work at all. You'd have to do this instead:
JSON.parse('"' + source.substring(imports[i].s, imports[i].e) + '"');
This is JavaScript, not JSON. They support different escape sequences. Here are some counter-examples that don't work with this approach:
import './\x61\x62\x63.js';
import './\u{20204}.js';
It will technically break for single-quoted strings containing a double quote. I assume this case would never come up in real code though.
It's missing the quote characters, so it won't actually work at all. You'd have to do this instead:
The .s
and .e
offset indices include the "
quote characters exactly for this reason, so you don't need additions.
This is JavaScript, not JSON. They support different escape sequences.
Both of the examples support JSON.parse
in my test.
It will technically break for single-quoted strings containing a double quote
Because of the first point, this case works out fine too.
Because of the first point, this case works out fine too.
Ah, of course this doesn't, will put some thought to this one.
Here's my test:
import lexer from 'es-module-lexer';
function tryCatch(fn) {
console.log(`\n${fn}:`);
try {
return fn();
} catch (err) {
return err;
}
}
let js, imp;
js = `import 'abc.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => JSON.parse(js.substring(imp.s, imp.e))));
console.log(tryCatch(() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"')));
js = `import './\\x61\\x62\\x63.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"')));
js = `import './\\u{20204}.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"')));
I'm running with node v15.2.0 and es-module-lexer v0.3.26 from npm. This is the output:
() => JSON.parse(js.substring(imp.s, imp.e)):
SyntaxError: Unexpected token a in JSON at position 0
at JSON.parse (<anonymous>)
at file:///Users/evan/Desktop/example.mjs:16:33
at tryCatch (file:///Users/evan/Desktop/example.mjs:6:12)
at file:///Users/evan/Desktop/example.mjs:16:13
() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"'):
abc.js
() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"'):
SyntaxError: Unexpected token x in JSON at position 4
at JSON.parse (<anonymous>)
at file:///Users/evan/Desktop/example.mjs:21:33
at tryCatch (file:///Users/evan/Desktop/example.mjs:6:12)
at file:///Users/evan/Desktop/example.mjs:21:13
() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"'):
SyntaxError: Unexpected token { in JSON at position 5
at JSON.parse (<anonymous>)
at file:///Users/evan/Desktop/example.mjs:25:33
at tryCatch (file:///Users/evan/Desktop/example.mjs:6:12)
at file:///Users/evan/Desktop/example.mjs:25:13
Edit: I think indirect eval
should work fine though:
import lexer from 'es-module-lexer';
function tryCatch(fn) {
console.log(`\n${fn}:`);
try {
return fn();
} catch (err) {
return err;
}
}
let js, imp;
js = `import 'abc.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => (0, eval)(js.substring(imp.s - 1, imp.e + 1))));
js = `import './\\x61\\x62\\x63.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => (0, eval)(js.substring(imp.s - 1, imp.e + 1))));
js = `import './\\u{20204}.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => (0, eval)(js.substring(imp.s - 1, imp.e + 1))));
js = `import './".js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => (0, eval)(js.substring(imp.s - 1, imp.e + 1))));
The output:
() => (0, eval)(js.substring(imp.s - 1, imp.e + 1)):
abc.js
() => (0, eval)(js.substring(imp.s - 1, imp.e + 1)):
./abc.js
() => (0, eval)(js.substring(imp.s - 1, imp.e + 1)):
./𠈄.js
() => (0, eval)(js.substring(imp.s - 1, imp.e + 1)):
./".js
Ok, I've put together a PR in https://github.com/guybedford/es-module-lexer/pull/62. Let me know if that seems like it can work here.
Yeah that approach seems good to me.
Released in 0.4.0.
I can confirm the fix. Looks good!
Context: https://github.com/vitejs/vite/issues/2083. Import paths are strings and can therefore contain escape sequences, like this:
Right now the documentation for this module seems to encourage using this library in a way that doesn't handle any escape sequences. Taking a raw slice of the input means that the backslashes are literal backslashes in the path instead of the actual underlying escape sequences. So in other words you would get the string
"./\\u0442\\u0435\\u0441\\u0442"
instead of the string"./тест"
and then fail to import the path.This can of course be worked around by interpreting the escape sequences yourself. But I thought I'd file this issue because it could be nice for either this library to do that for you or for there to be a recommended way of doing this, since one reason to use this library is to extract the import paths. If that's out of scope, maybe just calling out that you need to be sure to handle escape sequences yourself when you use this library with a link to the spec?