tree-sitter / tree-sitter-julia

Julia grammar for Tree-sitter
MIT License
93 stars 31 forks source link

Failed to build wasm #124

Closed wszgrcy closed 4 months ago

wszgrcy commented 8 months ago

I have built many times,I have succeeded a few times and also failed many times But every time it takes up a lot of memory, the computer's 64GB of memory is all occupied, and then it crashes

I use npx tree-sitter build-wasm xxx to build branch: master May I ask how to provide more detailed information to help troubleshoot issues?

savq commented 7 months ago

Tree-sitter-julia does require memory than other grammars, but this seems excessive. I'll try to troubleshoot and if I find the issue I'll add wasm to CI to avoid regressions.

piechologist commented 5 months ago

@savq Do you have any idea how to improve the situation? There is some interest in making a Julia extension for the Zed editor here. It looks like we need a machine with 128 GB RAM just to make the julia.wasm.

I read that Julia's use of unicode characters as operators may require more memory. I tried to remove most of the operators but that didn't help. What else could we try?

savq commented 5 months ago

I read that Julia's use of unicode characters as operators may require more memory. I tried to remove most of the operators but that didn't help. What else could we try?

The operators aren't really a problem. The biggest problem with julia syntax are sequences of expressions without any delimiter: macros, matrices and juxtapositions (sort of).

I made an experiment in this branch, refactoring some stuff and removing those problematic forms entirely. The resulting parser.c is still around 33M, and the wasm build takes around 10 minutes in my machine, but it could be a starting point for the zed extension.

maxbrunsfeld commented 5 months ago

I did some work on tree-sitter to improve this to some degree - I haven’t tested it in the Julia grammar.

https://github.com/tree-sitter/tree-sitter/pull/3234

piechologist commented 4 months ago

The operators aren't really a problem.

@savq, I tossed things around again and now I'm pretty sure that the number of operators is at least one of the problems. I missed the tree-sitter generate step 2 weeks ago, sorry.

Today I removed all but ≈ 50 operators. RAM usage went down from 60 GB to 4 GB with tree-sitter v0.20.8. Combined with your "smallish" branch, parser.c is 20 MB. That looks much better!

Unfortunately, I guess Zed requires a newer tree-sitter version (I see errors in Zed's log). Trying to build the WASM (tree-sitter build -w) with the latest tree-sitter v0.22.2 gives me the following error message:

This external scanner uses a symbol that isn't available to wasm parsers.

Missing symbols:
    exit

Available symbols:
    calloc
    free
    iswalnum
    iswalpha
    iswblank
    iswdigit
    iswlower
    iswspace
    iswupper
    iswxdigit
    malloc
    memchr
    memcmp
    memcpy
    memmove
    memset
    realloc
    strcmp
    strlen
    strncpy
    towlower
    towupper

What does that mean? The only exits I can find:

> grep -R exit
./src/scanner.c:  if (arr == NULL) exit(1);
./src/scanner.c:  if (stack == NULL) exit(1);
./src/scanner.c:  if (stack->len >= TREE_SITTER_SERIALIZATION_BUFFER_SIZE) exit(1);
./src/scanner.c:  if (stack->len == 0) exit(1);
clason commented 4 months ago

Exactly what it says on the tin: the exit function from libc is not available in wasm. #136 fixes this (among other things) by using the built-in abort instead.

piechologist commented 4 months ago

@clason: Thank you, I'd assumed scanner.c was auto-generated. I replaced the four instances of exit(0) with abort() and it built.

This is my setup:

In Zed, the syntax tree for some of my .jl files looks pretty clean considering all the hacks above. Basic syntax highlighting works.

I'd call it progress but it's certainly not safe for public consumption. I'll wait for #136 getting merged and try again.

clason commented 4 months ago

I'd assumed scanner.c was auto-generated.

It is not; it needs to be hand-written. And, yes, you should absolutely wait for the mentioned PRs to be merged.