guybedford / es-module-lexer

Low-overhead lexer dedicated to ES module parsing for fast analysis
MIT License
912 stars 47 forks source link

fix: handle regexp as default export #167

Closed fpipita closed 4 months ago

fpipita commented 4 months ago

Hi,

this pull request aims at handling another case of division/regexp ambiguity which I encountered in the attempt of parsing the recently added module https://github.com/markdown-it/uc.micro/blob/master/categories/S/regex.mjs:

The ambiguity can be triggered by trying to parse this minimal example:

export default /[`]/

The solution I came up with to decide whether the next token was a regexp, was to basically check if the last token position was compatible with the position of the last detected export name.

guybedford commented 4 months ago

Thanks for finding this and putting together a fix.

The approach isn't quite right unfortunately - this wouldn't handle the false positive - export default 1 / 2 and would think that it is a regex.

Instead the fix here is to integrate the default keyword itself into the regular expression logic. Because the default keyword cannot come in any other positions, it should be enough to add it to this detection list in isExpressionKeyword - https://github.com/guybedford/es-module-lexer/blob/main/src/lexer.c#L854.

That detection is saying that if the preceding keyword is one of those, then it must be a regex, as it is the start of an expression and not an internal part of a logical expression.

It should also be an even smaller diff I believe. Let me know if that works out.

guybedford commented 4 months ago

Ah it seems I spoke too soon - this is correct because you are validating that the regex comes immediately after the export default and not in any other position.

I would be grateful if we could add some tests for this case and also some tests for having comments between the export default and the regex:

export default 1/2
export default /* asdf */ 1/2
export default /* asdf */ /regex/
export default
// line comment
/regex/
export default
// line comment
1 / 2

, but then it seems we should be fine to land actually.

fpipita commented 4 months ago

Hi, thank for your quick reply and review.

I've added the unit tests and I can confirm that, by inspecting the execution through the debugger, all of them are parsed as expected.

However, in order to improve the reliability of the unit tests for the particularly tricky regexp/division scenario, in my opinion it would be useful to expose the value of the lastSlashWasDivision flag as "private metadata" to each export.

With this information available, in unit tests we could make sure that the ambiguity is handled as expected, eg.:

test("Regexp default export", () => {
  const [, exports] = parse("export default /[`]/");
  assert.deepStrictEqual(
    exports.map(expt => ({name: expt.n, regexp: expt.regexp})), 
    [{name: "default", regexp: true}]
  );
});
guybedford commented 4 months ago

Great, many thanks for checking the edge cases here. Will post out the patch release now.