Closed pokey closed 4 months ago
Ok I tried to reduce this one as much as possible:
module.exports = grammar({
name: "talon",
rules: {
source_file: ($) => $.matches,
matches: ($) => seq(repeat($.match), repeat1("-")),
match: ($) => seq(repeat("and"), ":"),
word: ($) => /[\p{Letter}][\p{Letter}]*/,
},
});
that still exhibits the problem. Notice the unused word
rule. If I remove it, or if I try to simplify that regex at all, then tree-sitter generate
works
Is there an unused word rule that we can remove too solve this issue?
Fixed it; it was a different rule https://github.com/wenkokke/tree-sitter-talon/pull/58
Now I'm running into another issue, though. Seems like c++ is no longer supported for external scanners; they have to be c. It claims that they should still work, but I'm getting the following error:
This external scanner uses a symbol that isn't available to wasm parsers.
Missing symbols:
_Znwm
_ZdlPvm
Available symbols:
calloc
free
iswalnum
iswalpha
iswblank
iswdigit
iswlower
iswspace
iswupper
iswxdigit
malloc
memchr
memcmp
memcpy
memmove
memset
realloc
strcmp
strlen
strncat
strncmp
strncpy
towlower
towupper
A bit of googling indicates that those symbols _Znwm
and _ZdlPvm
correspond to new
and delete
, respectively
I don’t have the capacity or ability to rewrite the scanner in C, but I would accept a PR that does the rewrite.
The tree-sitter website says:
C++ scanners are now deprecated and will be removed in the near future. While it is currently possible to write an external scanner in C++, it can be difficult to get working cross-platform and introduces extra requirements; therefore it is greatly preferred to use C.
The tree-sitter website says:
C++ scanners are now deprecated and will be removed in the near future. While it is currently possible to write an external scanner in C++, it can be difficult to get working cross-platform and introduces extra requirements; therefore it is greatly preferred to use C.
yes that's what I was referring to. I read that as "it still works today", but will stop working in the future. But seems like it is actually broken today
Fwiw I was able to get things working with this patch to tree-sitter:
From 4bdc6937f97f011029b43564d8c9a9111b79fe37 Mon Sep 17 00:00:00 2001
From: Pokey Rule <755842+pokey@users.noreply.github.com>
Date: Tue, 11 Jun 2024 12:20:28 +0100
Subject: [PATCH] Support `new` and `delete` symbols
---
lib/src/wasm/stdlib-symbols.txt | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/src/wasm/stdlib-symbols.txt b/lib/src/wasm/stdlib-symbols.txt
index 1b6d789e..d1252131 100644
--- a/lib/src/wasm/stdlib-symbols.txt
+++ b/lib/src/wasm/stdlib-symbols.txt
@@ -22,3 +22,5 @@
"strncpy",
"towlower",
"towupper",
+"_ZdlPvm",
+"_Znwm",
--
2.43.0
And even with that patch, I'm actually running into the same issue as before I ran the upgrade:
rejected promise not handled within 1 second: TypeError: _ is not a function
extensionHostProcess.js:147
stack trace: TypeError: _ is not a function
at e.<computed> (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/node_modules/web-tree-sitter/tree-sitter.js:1:14874)
at wasm://wasm/00042b72:wasm-function[7]:0xcbe
at wasm://wasm/000b9d8a:wasm-function[253]:0x24d6d
at Module._ts_parser_parse_wasm (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/node_modules/web-tree-sitter/tree-sitter.js:1:30476)
at Parser.parse (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/node_modules/web-tree-sitter/tree-sitter.js:1:49250)
at /Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/out/extension.js:109:43
at Generator.next (<anonymous>)
at fulfilled (/Users/pokey/.vscode/extensions/pokey.parse-tree-0.30.0/out/extension.js:5:58)
Ah ok looks like my hack from https://github.com/wenkokke/tree-sitter-talon/issues/57#issuecomment-2160496737 backfired. It's failing on this line in the wasm:
call $env._Znwm
Note that that _Znwm
is the mangled symbol for operator new
. Looks like we need to bite the bullet and migrate the external scanner to C 😭
Ok migrated in https://github.com/wenkokke/tree-sitter-talon/pull/59. Thanks ChatGPT
My old patches for tree-sitter added symbols, but I think they've made their code for it more user friendly by using the user facing names for the symbols.
Which is to say, either add new and delete or somewhere else they still have the internal symbols.
The change shouldn't be that difficult to implement. We basically just need some C implementation of vectors. The rest ports over pretty easily.
We could honestly probably base our adaptation on whatever changes the Python scanner has made.
I have this working on my fork now https://github.com/wenkokke/tree-sitter-talon/compare/dev...pokey:tree-sitter-talon:dev (notice there are 3 commits; one of them is big because it runs tree-sitter generate
with new tree-sitter version)
Feel free to diff my scanner.c
with your scanner.cc
and you'll see the changes are quite mechanical
Ok Cursorless is now relying on my fork of tree-sitter-talon while we wait for this upgrade. Ping me when you get a chance to do the upgrade and we'll switch back to your repo. Here's our tracker issue https://github.com/cursorless-dev/vscode-parse-tree/issues/85
Fixed in HEAD.
I'm trying to bump tree-sitter in Cursorless, and it looks like tree-sitter generate needs to be re-run with a more recent version of the tree-sitter cli, otherwise we get a runtime error:
I tried to do the bump myself:
But when I run
npm install
, I get an error intree-sitter generate
:I found https://github.com/tree-sitter/tree-sitter/issues/768, but I can't tell if it's the same issue. Have you seen anything like that before?