Integrate Test262 - Githubissues

jugglinmike commented 5 years ago

Retrieve the Test262 project from its canonical location on GitHub.com. Use the test262-stream module to create tests in accordance with that project's interpretation guidelines. Use the test-interpreter project to maintain a expectations file of tests which Esprima is currently expected to fail.

This patch introduces two new npm dependencies during development:

test262-stream - A Node.js readable stream for traversing the Test262 test suite (this creates the test code based on the contents of the Test262 directory)
results-interpreter - A Node.js writable stream for interpreting streaming test results (this implements the code necessary to interpret the results in terms of an expectations file)

These projects were created to reduce duplication (both in code and in maintenance effort) between Test262 consumers. They're both used by JSHint, Babel, and Esprima. Acorn uses just test262-stream, along with many folks testing JavaScript runtimes through test262-harness.

As noted in the process output, the expectations file can be automatically updated by specifying the --update-expectations command-line flag. It's also safe to manually insert comments, so you folks can keep notes about the failures in-line.

Here's the summary of the results:

✔ 49969 valid programs parsed without error
✔ 2493 invalid programs produced a parsing error
✔ 2879 invalid programs did not produce a parsing error (and allowed by the expectations file)
✔ 11223 valid programs produced a parsing error (and allowed by the expectations file)

This is meant as a starting point. In addition to the feedback you folks have for me, there are a few details which I'd like to work out:

Where should the Test262 repository be placed? The "examples" directory would make sense conceptually, but all of its contents are currently interpreted as valid literal program code, so that won't work with the current setup.
How can we parse script code? I haven't been able to find documentation for determining the top-level parsing goal, though it seems like it currently defaults to "module code."

jugglinmike commented 5 years ago

@lee-dohm @maxbrunsfeld This patch can no longer be merged to do conflicting changes on the master branch. I can see about resolving them, but are you still interested in integrating Test262?

jugglinmike commented 4 years ago

@lee-dohm @maxbrunsfeld this patch was meant as a follow-up to our discussion in the language-javascript project. Is there someone else I should be talking to about it?

mjambon commented 3 years ago

Hey @jugglinmike. I'm following up on this very old issue and pull request.

A couple of us working on semgrep are now members of the tree-sitter-javascript and tree-sitter-typescript projects, as well as grammars for various other languages. We are particularly interested in having reliable parsers and fixing errors before our customers encounter them.

I think we would greatly benefit from testing tree-sitter-javascript against Test262. Let me see if I can answer your original questions:

Where should the Test262 repository be placed? The "examples" directory would make sense conceptually, but all of its contents are currently interpreted as valid literal program code, so that won't work with the current setup.

Given that test and examples already have a standard structure across all tree-sitter projects, it should go to another folder. Maybe we just call it test262 unless someone has a suggestion for something more abstract like misc or extra or other-tests. We can always rename it later since it not something users should rely on.

How can we parse script code? I haven't been able to find documentation for determining the top-level parsing goal, though it seems like it currently defaults to "module code."

I don't know about javascript scripts. How are they different from any other javascript file? Is it just that first line is a magic number like #!/usr/bin/env node? At the moment, tree-sitter supports only one entry point per grammar, which the first rule occurring in the grammar, as generated by grammar.js. Here, it's program:

  rules: {
    program: $ => seq(
      optional($.hash_bang_line),
      repeat($.statement)
    ),

An optional hashbang line is supported so maybe this answers the question about scripts.

I'd be in favor of moving forward with this PR. Our shared concerned is maintenance. I don't see an issue here since this is not an intrusive change. Even if running the Test262 suite for some reason becomes a problem, we can always disable it.

Let's see what others have to say. I'll follow up in a week if there's no reaction.

Cc: my colleague at r2c @aryx, and @maxbrunsfeld

jugglinmike commented 3 years ago

How can we parse script code? I haven't been able to find documentation for determining the top-level parsing goal, though it seems like it currently defaults to "module code."

I don't know about javascript scripts. How are they different from any other javascript file? Is it just that first line is a magic number like #!/usr/bin/env node?

Hi there, @mjambon. "Script" and "Module" are two parsing goals for JavaScript programs. The first is how JavaScript interpreters (including browsers, Node.js, and parsers) historically parsed all JavaScript programs. The second was introduced by ES2015. The biggest differences between the two are: Modules allow export/import declarations (Scripts do not), and Modules are always strict mode code (Scripts require a "use strict" directive to be interpreted in strict mode).

These two goals cannot be differentiated by inspecting the program source code itself; interpreters have to receive that information out-of-band. For browsers, that's done by adding type="module" to the HTML <script> element. For Node.js, that's done either through the use of the .mjs filename extension or through configuration in the package.json manifest. Parsers have their own ways of accepting this information.

Some Test262 tests are written specifically for the Module parsing goal. When running the tests, you'll need to tell make tree-sitter-javascript aware of this. For instance, Test262 might include one test for Script code which is expected to produce a SyntaxError:

export default 0;

And it might have another test with the same source, but that's expected to pass as Module code.

In order to pass both of those tests, you'll nee to make tree-sitter-javascript aware of the parsing goal.

You can read Test262's INTERPRETING.md document to learn more about how to interpret the tests. The test262-stream project that this patch uses takes care of some of those details (e.g. making "strict mode" and "non strict mode" versions of tests), but the top-level parsing goal is something that needs a custom integration. Here's an example for how we did that with the JSHint linter.

aryx commented 3 years ago

How long does it take to parse test262? I remember this was a really huge repository with a huge number of files (like > 10 000 if I remember correctly).

aryx commented 3 years ago

test262 could be part of a parsing testsuite, like the one we have in semgrep with a list of top 20 github repo for each language.

mjambon commented 3 years ago

Thanks for the explanation, @jugglinmike.

For instance, Test262 might include one test for Script code which is expected to produce a SyntaxError

Tree-sitter is focused on text-editor syntax highlighting, and semgrep uses it for searching for patterns in code. In general, we're ok with invalid code that parses without an error. A problem would be valid code that doesn't parse or valid code that's misinterpreted, producing an incorrect parse tree.

That's why I think the tree-sitter-javascript parser, with its single entrypoint, should be able to parse both scripts and modules. If a user wants to make sure there are no export and such in a script, this can be done after parsing, by inspecting the parse tree (or CST - concrete syntax tree) produced by tree-sitter.

Earlier, you showed this result:

[TP] ✔ 49969 valid programs parsed without error
[TN] ✔ 2493 invalid programs produced a parsing error
[FP] ✔ 2879 invalid programs did not produce a parsing error (and allowed by the expectations file)
[FN] ✔ 11223 valid programs produced a parsing error (and allowed by the expectations file)

So if we run our grammar on Test262, we'll pay attention only to lines TP and FN. We'll try minimize FN (false negatives) and maximize TP (true positives). In other words, we don't care whether invalid programs are reported as valid or invalid.

For IDE support, it's actually best to parse as much as possible of an invalid program. Tree-sitter has an error recovery mechanism which allows producing a partial parse tree, with special ERROR nodes for the portions that it can't parse. This mechanism is also useful to tolerate errors due to the grammar being incomplete.

tree-sitter / tree-sitter-javascript

Integrate Test262 #83