tree-sitter / tree-sitter-javascript

Javascript grammar for tree-sitter
MIT License
314 stars 108 forks source link

Segmentation Fault when Parsing Javascript #306

Closed soohoonc closed 4 weeks ago

soohoonc commented 2 months ago

The following piece of code is valid but it is parsed incorrectly:

/**
   post-js-header.js is to be prepended to other code to create
   post-js.js for use with Emscripten's --post-js flag. This code
   requires that it be running in that context. The Emscripten
   environment must have been set up already but it will not have
   loaded its WASM when the code in this file is run. The function it
   installs will be run after the WASM module is loaded, at which
   point the sqlite3 JS API bits will get set up.
*/
if(!Module.postRun) Module.postRun = [];
Module.postRun.push(function(Module/*the Emscripten-style module object*/){
  'use strict';
  /* This function will contain at least the following:

     - post-js-header.js (this file)
     - sqlite3-api-prologue.js  => Bootstrapping bits to attach the rest to
     - common/whwasmutil.js     => Replacements for much of Emscripten's glue
     - jaccwaby/jaccwabyt.js    => Jaccwabyt (C/JS struct binding)
     - sqlite3-api-glue.js      => glues previous parts together
     - sqlite3-api-oo.js        => SQLite3 OO API #1
     - sqlite3-api-worker1.js   => Worker-based API
     - sqlite3-vfs-helper.c-pp.js  => Utilities for VFS impls
     - sqlite3-vtab-helper.c-pp.js => Utilities for virtual table impls
     - sqlite3-vfs-opfs.c-pp.js  => OPFS VFS
     - sqlite3-vfs-opfs-sahpool.c-pp.js => OPFS SAHPool VFS
     - sqlite3-api-cleanup.js   => final API cleanup
     - post-js-footer.js        => closes this postRun() function
  */

Test script:

// index.js

const Parser = require('tree-sitter')
const JavaScript = require('tree-sitter-javascript')
const fs = require('fs')

const parser = new Parser()
parser.setLanguage(JavaScript)

const contents= fs.readFileSync('test.js', 'utf8')

const root = parser.parse(contents);

The output of tree-sitter parse is the following:

segmentation fault  node index.js

Getting a segmentation fault when trying to parse this file from: https://github.com/sqlite/sqlite/blob/master/ext/wasm/api/post-js-header.js

maxbrunsfeld commented 1 month ago

I can't reproduce this with the latest versions of the Tree-sitter CLI (0.22.2) and the master branch of this repository.

What version of the tree-sitter CLI are you using?

JacksonKearl commented 1 month ago

test.js:

import Javascript from "tree-sitter-javascript"
import TS from "tree-sitter-typescript"
import Parser from "tree-sitter"

const parser = new Parser()
// console.log(TS.typescript)
parser.setLanguage(Javascript)

const tree = parser.parse(`/**
post-js-header.js is to be prepended to other code to create
post-js.js for use with Emscripten's --post-js flag. This code
requires that it be running in that context. The Emscripten
environment must have been set up already but it will not have
loaded its WASM when the code in this file is run. The function it
installs will be run after the WASM module is loaded, at which
point the sqlite3 JS API bits will get set up.
*/
if(!Module.postRun) Module.postRun = [];
Module.postRun.push(function(Module/*the Emscripten-style module object*/){
'use strict';
/* This function will contain at least the following:

  - post-js-header.js (this file)
  - sqlite3-api-prologue.js  => Bootstrapping bits to attach the rest to
  - common/whwasmutil.js     => Replacements for much of Emscripten's glue
  - jaccwaby/jaccwabyt.js    => Jaccwabyt (C/JS struct binding)
  - sqlite3-api-glue.js      => glues previous parts together
  - sqlite3-api-oo.js        => SQLite3 OO API #1
  - sqlite3-api-worker1.js   => Worker-based API
  - sqlite3-vfs-helper.js    => Internal-use utilities for...
  - sqlite3-vfs-opfs.js      => OPFS VFS
  - sqlite3-api-cleanup.js   => final API cleanup
  - post-js-footer.js        => closes this postRun() function
*/`)

console.log(tree.rootNode.toString())
➜  treesitter-test node -v                  
v18.19.1
➜  treesitter-test npm i tree-sitter-javascript@0.20.3

up to date, audited 59 packages in 463ms

11 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities
➜  treesitter-test node test.js                       
[1]    6955 segmentation fault  node test.js
➜  treesitter-test npm i tree-sitter-javascript@0.20.1

changed 1 package, and audited 59 packages in 2s

11 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities
➜  treesitter-test node test.js                       
(ERROR (comment) (if_statement condition: (parenthesized_expression (unary_expression argument: (member_expression object: (identifier) property: (property_identifier)))) consequence: (expression_statement (assignment_expression left: (member_expression object: (identifier) property: (property_identifier)) right: (array)))) (member_expression object: (member_expression object: (identifier) property: (property_identifier)) property: (property_identifier)) (formal_parameters (identifier) (comment)) (expression_statement (string (string_fragment))) (comment))
➜  treesitter-test npm i tree-sitter-javascript@latest

changed 1 package, and audited 59 packages in 2s

11 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities
➜  treesitter-test node test.js                       
[1]    8454 segmentation fault  node test.js
➜  treesitter-test 
JacksonKearl commented 1 month ago

That's with tree-sitter@0.20.6, with @latest (0.21.0), I get:

/Users/jacksonkearl/Contractor/treesitter-test/node_modules/tree-sitter/index.js:338
    setLanguage.call(this, language);
                ^

TypeError: Invalid language object
    at Parser.setLanguage (/Users/jacksonkearl/Contractor/treesitter-test/node_modules/tree-sitter/index.js:338:17)
    at file:///Users/jacksonkearl/Contractor/treesitter-test/test.js:7:8
    at ModuleJob.run (node:internal/modules/esm/module_job:195:25)
    at async ModuleLoader.import (node:internal/modules/esm/loader:336:24)
    at async loadESM (node:internal/process/esm_loader:34:7)
    at async handleMainPromise (node:internal/modules/run_main:106:12)

Node.js v18.19.1
maxbrunsfeld commented 1 month ago

If the error only happens with the Node.js binding, I think we should probably close this and reopen the issue there.

There is a breaking change happening with that module, where it's recently been converted to use NAPI, instead of the old V8-specific Node API, so that might be why you're seeing Invalid language object - the tree-sitter-javascript node bindings files need to be regenerated.

@soohoonc This doesn't seem to be a problem, parsing that file with the current tree-sitter CLI. When you originally hit this, was it with the Node.js binding?

amaanq commented 4 weeks ago

the latest release is compatible with the napi bindings so this can be closed, unless you can reproduce with that as well