tree-sitter / java-tree-sitter

Java bindings to the Tree-sitter parsing library
https://tree-sitter.github.io/java-tree-sitter/
MIT License
27 stars 6 forks source link

Allow tree-sitter library path to be configurable at runtime #36

Closed bioball closed 2 months ago

bioball commented 2 months ago

Thanks for this library!

We're using java-tree-sitter as part of our CLI, which gets shipped as a jar to our users' machines. We'd like to be able to control exactly where the tree-sitter library lives, to prevent users from having to spawn Java with custom flags (e.g. -Djava.library.path).

We can do this if java-tree-sitter provided a way to specify the library path programmatically at runtime. Java 22's FFI API allows this (see java.lang.foreign.SymbolLookup#libraryLookup(java.nio.file.Path, java.lang.foreign.Arena)).

It'd be great if the java-tree-sitter library gave us a way to do this, too. It'd involve changing the logic here from being hard-coded.

For now, we plan on modifying TreeSitter.java and re-compiling it during our own build. For reference: https://github.com/apple/pkl-lsp/pull/23

ObserverOfTime commented 2 months ago

It's possible with Language#load.

Alternatively you can modify java.library.path or the respective env var in your library code.

P.S. You might wanna use kotlin-tree-sitter instead.

bioball commented 2 months ago

That allows us to control the location of our generated parser (e.g. libtree-sitter-pkl.dylib), but not the tree-sitter library (e.g. libtree-sitter.dylib). As far as I can tell, java-tree-sitter requires that it be installed on the system, or provided via java.library.path), neither which are great solutions for us.

I'm curious: why use kotlin-tree-sitter? Is java-tree-sitter not ready for production use yet?

ObserverOfTime commented 2 months ago

As far as I can tell, java-tree-sitter requires that it be installed on the system, or provided via java.library.path)

That's correct, and likely not something that can easily be fixed since that part is auto-generated via jextract.

I'm curious: why use kotlin-tree-sitter? Is java-tree-sitter not ready for production use yet?

  1. You're using Kotlin already.
  2. It supports older JVM versions plus native.
  3. It bundles the tree-sitter library in its jar.

Neither of them is quite ready for production use yet.

bioball commented 2 months ago

From what I can tell, that part is hand-authored: https://github.com/tree-sitter/java-tree-sitter/blob/53c2d15ea8abd16c3a36bfe255fe921a6e2950f0/src/test/java/io/github/treesitter/jtreesitter/languages/TreeSitterJava.java#L25-L32

Is that not the case?

ObserverOfTime commented 2 months ago

That's not where the tree-sitter library is loaded. The generated file is not in the repo.

bioball commented 2 months ago

Yeah, I get that. But, what I'd like is a way to provide my own java.nio.file.Path to SymbolLookup.libraryLookup(), but it's currently hard-coded. If we can, it means that I can provide the tree-sitter libraries myself when users start our jar.

ObserverOfTime commented 2 months ago

Again, you can provide the parser libraries via Language#load but not the tree-sitter library itself.

Marcono1234 commented 2 months ago

That is a current limitation of jextract, though there are plans to possibly support this in the future, see this message by one of the JDK developers: https://mail.openjdk.org/pipermail/jextract-dev/2024-August/001920.html And maybe this feature request is about the same topic: https://bugs.openjdk.org/browse/CODETOOLS-7903186

But even then the question is how to expose this as part of the jtreesitter API? Maybe by using a static volatile SymbolLookup internally which can be overwritten?

A safer but more verbose solution would be to have a JTreesitter(SymbolLookup) factory or similar which you can use as alternative to obtain the objects, e.g. jtreesitter.newLanguage(...), jtreesitter.newParser(...)? (and keep the existing API which then instead uses the current default symbol lookup?)

Marcono1234 commented 2 months ago

@ObserverOfTime, but since you are tweaking the generated source in the TreeSitter already anyway, would it be possible to change it to something similar as you did in https://github.com/tree-sitter/tree-sitter-java/pull/182? That is, use SymbolLookup#libraryLookup by default but if that fails fall back to SymbolLookup#loaderLookup[^1]?

try {
    return SymbolLookup.libraryLookup(..., arena);
} catch (IllegalArgumentException e) {
    return SymbolLookup.loaderLookup();
}

Maybe that would already help for your use case @bioball, because it seems you mainly want to have more control over the path, which would then be possible either using java.library.path or with System#load(String)[^2].

[^1]: https://bugs.openjdk.org/browse/CODETOOLS-7903654 mentions that SymbolLookup#libraryLookup has better integration with the dynamic linker, but I am not sure if that is relevant here. [^2]: For the use cases where you want not only control over the path but also when the library is unloaded, that would not help because it requires a SymbolLookup, calling Arena#close to unload the library. But maybe those use cases are rarer.