wenkokke / tree-sitter-talon

Tree Sitter parser for Talon files.
MIT License
8 stars 5 forks source link

Upgrade tree-sitter to v0.21 #57

Closed pokey closed 5 months ago

pokey commented 5 months ago

I'm trying to bump tree-sitter in Cursorless, and it looks like tree-sitter generate needs to be re-run with a more recent version of the tree-sitter cli, otherwise we get a runtime error:

rejected promise not handled within 1 second: TypeError: _ is not a function
extensionHostProcess.js:147
stack trace: TypeError: _ is not a function
    at e.<computed> (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/node_modules/web-tree-sitter/tree-sitter.js:1:14874)
    at wasm://wasm/00042b72:wasm-function[7]:0xcbe
    at wasm://wasm/000b9d8a:wasm-function[253]:0x24d6d
    at Module._ts_parser_parse_wasm (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/node_modules/web-tree-sitter/tree-sitter.js:1:30476)
    at Parser.parse (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/node_modules/web-tree-sitter/tree-sitter.js:1:49250)
    at /Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/out/extension.js:109:43
    at Generator.next (<anonymous>)
    at fulfilled (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/out/extension.js:5:58)

I tried to do the bump myself:

From ca88886c1369388a947059d37b44abf8ffc38d77 Mon Sep 17 00:00:00 2001
From: Pokey Rule <755842+pokey@users.noreply.github.com>
Date: Mon, 10 Jun 2024 17:23:56 +0100
Subject: [PATCH] attempt to bump tree-sitter to 0.21

---
 package.json | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/package.json b/package.json
index 32b7de1..87a2dbb 100644
--- a/package.json
+++ b/package.json
@@ -32,8 +32,11 @@
     "nan": "^2.16.0"
   },
   "devDependencies": {
-    "tree-sitter-cli": "^0.20.8",
-    "web-tree-sitter": "^0.20"
+    "tree-sitter-cli": "^0.22.6",
+    "web-tree-sitter": "^0.22.6"
+  },
+  "peerDependencies": {
+    "tree-sitter": "^0.21.1"
   },
   "tree-sitter": [
     {
-- 
2.43.0

But when I run npm install, I get an error in tree-sitter generate:

> tree-sitter generate

thread 'main' panicked at cli/src/generate/render.rs:259:63:
no entry found for key
stack backtrace:
   0:        0x100b5e444 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h01b2beffade888b2
   1:        0x100b827a4 - core::fmt::write::hbadb443a71b75f23
   2:        0x100b59d94 - std::io::Write::write_fmt::hc09d7755e3ead5f0
   3:        0x100b5e29c - std::sys_common::backtrace::print::h28349e5c25acbac7
   4:        0x100b5fbe4 - std::panicking::default_hook::{{closure}}::hd24b6196784d991e
   5:        0x100b5f8c8 - std::panicking::default_hook::hfcec80a2720c8c73
   6:        0x100b604d8 - std::panicking::rust_panic_with_hook::h84760468187ddc85
   7:        0x100b5fec4 - std::panicking::begin_panic_handler::{{closure}}::he666a5eb600a7203
   8:        0x100b5e8c8 - std::sys_common::backtrace::__rust_end_short_backtrace::h592f44d2bf9f843f
   9:        0x100b5fc3c - _rust_begin_unwind
  10:        0x100bf53cc - core::panicking::panic_fmt::h98bbf7bdf4994454
  11:        0x100b80284 - core::panicking::panic_display::h9e44296eb53a3be8
  12:        0x100bf5398 - core::option::expect_failed::ha53d7c88236a0849
  13:        0x1005e9334 - tree_sitter_cli::generate::render::Generator::generate::h1d81c2206acd9eba
  14:        0x1005ef5c4 - tree_sitter_cli::generate::render::render_c_code::h37bf660689206b05
  15:        0x1006035a0 - tree_sitter_cli::generate::generate_parser_for_grammar_with_opts::hd06f125844860c46
  16:        0x100602b90 - tree_sitter_cli::generate::generate_parser_in_directory::h1f79885d9ec5f2b7
  17:        0x100575260 - tree_sitter::run::h99b955e7357ae881
  18:        0x100574280 - tree_sitter::main::h9c9e91ed7653bfdb
  19:        0x1005ae214 - std::sys_common::backtrace::__rust_begin_short_backtrace::h729442224d13961d
  20:        0x1005ab2a0 - std::rt::lang_start::{{closure}}::hc1f65272a5933202
  21:        0x100b53008 - std::rt::lang_start_internal::h39923ab4c3913741
  22:        0x1005917c8 - _main

I found https://github.com/tree-sitter/tree-sitter/issues/768, but I can't tell if it's the same issue. Have you seen anything like that before?

pokey commented 5 months ago

Ok I tried to reduce this one as much as possible:

module.exports = grammar({
  name: "talon",

  rules: {
    source_file: ($) => $.matches,

    matches: ($) => seq(repeat($.match), repeat1("-")),

    match: ($) => seq(repeat("and"), ":"),

    word: ($) => /[\p{Letter}][\p{Letter}]*/,
  },
});

that still exhibits the problem. Notice the unused word rule. If I remove it, or if I try to simplify that regex at all, then tree-sitter generate works

wenkokke commented 5 months ago

Is there an unused word rule that we can remove too solve this issue?

pokey commented 5 months ago

Fixed it; it was a different rule https://github.com/wenkokke/tree-sitter-talon/pull/58

pokey commented 5 months ago

Now I'm running into another issue, though. Seems like c++ is no longer supported for external scanners; they have to be c. It claims that they should still work, but I'm getting the following error:

This external scanner uses a symbol that isn't available to wasm parsers.

Missing symbols:
    _Znwm
    _ZdlPvm

Available symbols:
    calloc
    free
    iswalnum
    iswalpha
    iswblank
    iswdigit
    iswlower
    iswspace
    iswupper
    iswxdigit
    malloc
    memchr
    memcmp
    memcpy
    memmove
    memset
    realloc
    strcmp
    strlen
    strncat
    strncmp
    strncpy
    towlower
    towupper

A bit of googling indicates that those symbols _Znwm and _ZdlPvm correspond to new and delete, respectively

wenkokke commented 5 months ago

I don’t have the capacity or ability to rewrite the scanner in C, but I would accept a PR that does the rewrite.

wenkokke commented 5 months ago

The tree-sitter website says:

C++ scanners are now deprecated and will be removed in the near future. While it is currently possible to write an external scanner in C++, it can be difficult to get working cross-platform and introduces extra requirements; therefore it is greatly preferred to use C.

pokey commented 5 months ago

The tree-sitter website says:

C++ scanners are now deprecated and will be removed in the near future. While it is currently possible to write an external scanner in C++, it can be difficult to get working cross-platform and introduces extra requirements; therefore it is greatly preferred to use C.

yes that's what I was referring to. I read that as "it still works today", but will stop working in the future. But seems like it is actually broken today

pokey commented 5 months ago

Fwiw I was able to get things working with this patch to tree-sitter:

From 4bdc6937f97f011029b43564d8c9a9111b79fe37 Mon Sep 17 00:00:00 2001
From: Pokey Rule <755842+pokey@users.noreply.github.com>
Date: Tue, 11 Jun 2024 12:20:28 +0100
Subject: [PATCH] Support `new` and `delete` symbols

---
 lib/src/wasm/stdlib-symbols.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/src/wasm/stdlib-symbols.txt b/lib/src/wasm/stdlib-symbols.txt
index 1b6d789e..d1252131 100644
--- a/lib/src/wasm/stdlib-symbols.txt
+++ b/lib/src/wasm/stdlib-symbols.txt
@@ -22,3 +22,5 @@
 "strncpy",
 "towlower",
 "towupper",
+"_ZdlPvm",
+"_Znwm",
-- 
2.43.0
pokey commented 5 months ago

And even with that patch, I'm actually running into the same issue as before I ran the upgrade:

rejected promise not handled within 1 second: TypeError: _ is not a function
extensionHostProcess.js:147
stack trace: TypeError: _ is not a function
    at e.<computed> (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/node_modules/web-tree-sitter/tree-sitter.js:1:14874)
    at wasm://wasm/00042b72:wasm-function[7]:0xcbe
    at wasm://wasm/000b9d8a:wasm-function[253]:0x24d6d
    at Module._ts_parser_parse_wasm (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/node_modules/web-tree-sitter/tree-sitter.js:1:30476)
    at Parser.parse (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/node_modules/web-tree-sitter/tree-sitter.js:1:49250)
    at /Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/out/extension.js:109:43
    at Generator.next (<anonymous>)
    at fulfilled (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/out/extension.js:5:58)
pokey commented 5 months ago

Ah ok looks like my hack from https://github.com/wenkokke/tree-sitter-talon/issues/57#issuecomment-2160496737 backfired. It's failing on this line in the wasm:

    call $env._Znwm

Note that that _Znwm is the mangled symbol for operator new. Looks like we need to bite the bullet and migrate the external scanner to C 😭

pokey commented 5 months ago

Ok migrated in https://github.com/wenkokke/tree-sitter-talon/pull/59. Thanks ChatGPT

wenkokke commented 5 months ago

My old patches for tree-sitter added symbols, but I think they've made their code for it more user friendly by using the user facing names for the symbols.

wenkokke commented 5 months ago

Which is to say, either add new and delete or somewhere else they still have the internal symbols.

wenkokke commented 5 months ago

The change shouldn't be that difficult to implement. We basically just need some C implementation of vectors. The rest ports over pretty easily.

wenkokke commented 5 months ago

We could honestly probably base our adaptation on whatever changes the Python scanner has made.

pokey commented 5 months ago

I have this working on my fork now https://github.com/wenkokke/tree-sitter-talon/compare/dev...pokey:tree-sitter-talon:dev (notice there are 3 commits; one of them is big because it runs tree-sitter generate with new tree-sitter version)

Feel free to diff my scanner.c with your scanner.cc and you'll see the changes are quite mechanical

pokey commented 5 months ago

Ok Cursorless is now relying on my fork of tree-sitter-talon while we wait for this upgrade. Ping me when you get a chance to do the upgrade and we'll switch back to your repo. Here's our tracker issue https://github.com/cursorless-dev/vscode-parse-tree/issues/85

wenkokke commented 5 months ago

Fixed in HEAD.