Closed Marcono1234 closed 2 months ago
The plan is to eventually integrate those bindings into the parsers (see tree-sitter/tree-sitter-java#182).
But that is specifically for tree-sitter-java, right? That would certainly be useful, but I was thinking of a more general solution for all parsers, e.g. Python, JSON, ... since they all have a tree_sitter_<lang>
function with the same signature (?).
The CLI will generate bindings for all parsers like it does for other languages.
Ah, I think I misunderstood you. Is the plan to generate Java bindings for all parsers, and the tree-sitter-java one was just an example? That would be great then!
But would it make sense nonetheless to add a generic loadLanguage
method here, for cases where a repository does not include a bindings/java/.../TreeSitter<lang>.java
yet?
I was thinking of something like this:
public final class Language {
/**
* Loads a language using the given symbol lookup for the native library.
* For example:
* {@snippet lang=java :
* Path pathToLibrary = Path.of("libtree-sitter-python.so");
* SymbolLookup libraryLookup = SymbolLookup.libraryLookup(pathToLibrary, Arena.ofAuto());
* Language language = Language.loadLanguage(libraryLookup, "python");
* }
*
* @throws IllegalArgumentException If the Tree-sitter language function cannot be found using the symbol lookup
*/
public static Language loadLanguage(SymbolLookup symbolLookup, String languageName) throws IllegalArgumentException {
String functionName = "tree_sitter_" + languageName;
MemorySegment functionAddress = symbolLookup.find(functionName)
.orElseThrow(() -> new IllegalArgumentException("Language function '%s' not found".formatted(functionName)));
var voidPtr = ValueLayout.ADDRESS.withTargetLayout(MemoryLayout.sequenceLayout(Long.MAX_VALUE, ValueLayout.JAVA_BYTE));
var funcDesc = FunctionDescriptor.of(voidPtr);
var function = Linker.nativeLinker().downcallHandle(functionAddress, funcDesc);
MemorySegment languagePointer;
try {
languagePointer = ((MemorySegment) function.invokeExact()).asReadOnly();
} catch (Throwable t) {
throw new RuntimeException("Failed to call language function", t);
}
return new Language(languagePointer);
}
/**
* Creates a new instance from the given language pointer.
*
* <p>Normally you don't have to obtain the language pointer yourself. Instead, you can either use the
* generated Java bindings for a parser, for example:
* {@snippet lang=java :
* var pointer = TreeSitterPython.language();
* Language language = new Language(pointer);
* }
* Or you can use {@link #loadLanguage(SymbolLookup, String)} to obtain a {@code Language} instance.
*
* @implNote It is up to the caller to ensure that the pointer is valid.
*
* @throws IllegalArgumentException If the language version is incompatible.
*/
public Language(MemorySegment address) {
// ...
}
// ...
}
The Javadoc here intentionally refers to tree-sitter-python to reduce confusion and to indicate that it works with any parser; otherwise a user might confuse tree-sitter-java with java-tree-sitter / jtreesitter, or think this jtreesitter only works with the Java parser.
But would it make sense nonetheless to add a generic
loadLanguage
method here
Only until the bindings are autogenerated, at which point it'll be deprecated.
(Please correct me if anything of the following is wrong)
If I understand it correctly, for all parser implementations there is always a
tree_sitter_<lang>
function, and it always has the same signature.Currently jtreesitter only provides a
Language(MemorySegment)
constructor, so you have to generate boilerplate code which looks up thetree_sitter_<lang>
function and invokes it (as done in the test code). This can be an obstacle for new users of jtreesitter because they either have to be a bit familiar withjava.lang.foreign
, or blindly copy code they don't understand.It would be useful if
Language
provided a convenience method for this, for example:The user could then easily use
SymbolLookup#libraryLookup
to load the library and then use thatLanguage#loadLanguage
method.If you want I can try to create a proof-of-concept PR for this.