tree-sitter / tree-sitter-haskell

Haskell grammar for tree-sitter.
MIT License
151 stars 36 forks source link

bump tree-sitter in Cargo.toml #85

Closed tek closed 2 years ago

tek commented 2 years ago

@VixieTSQ I updated the version and ran cargo update, is that doing anything for you?

VixieTSQ commented 2 years ago

This works! Says the compiler at least. And that's because the compiler assumes the Language it's getting back from the FFI call is indeed a version 0.20 Language. But I'm scared that might not be correct, which would cause UB. And I assume the package.json specifies dependencies for the javascript or whatever? I've never used it but it looks like it still specifies 0.19. Whatever you did to generate the rust binding and everything should probably be redone, instead of just changing the dependency for rust.

tek commented 2 years ago

I deleted the Rust files and ran tree-sitter generate. Didn't change much, but for some reason it referred to tree_sitter_javascript::language() before?? I also uncommented the block that recompiles scanner.c when changed.

I'm not quite sure I understand your problem description, but since the bindings aren't precompiled, shouldn't everything depend solely on the versions involved when you're building your program?

VixieTSQ commented 2 years ago

@tek Okay so with some further exploration into how tree-sitter works, here's a better explanation... here in lib.rs

extern "C" {
    fn tree_sitter_haskell() -> Language;
}

this defines a function that comes from the file src/parser.c with a return type of Language. Specifically the Language struct from the rust tree-sitter crate. My worry is that parser.c is outdated and returns a Language (the C struct) that is actually different from Language (the rust struct). Since the rustc compiler has no knowledge of C and does no checks for this sort of thing, only assuming these two structs are actually the same thing, this would cause UB if there was a breaking change between version 0.19 and version 0.20 of tree-sitter.

tek commented 2 years ago

Isn't the parser compiled against a tree-sitter version that's under your control? Or do you depend only on this Cargo.toml and the dependency in there is separate from the Rust API that your program uses?

VixieTSQ commented 2 years ago

Isn't the parser compiled against a tree-sitter version that's under your control? Or do you depend only on this Cargo.toml and the dependency in there is separate from the Rust API that your program uses?

Err.. I don't understand (maybe I'm just speaking nonsense and wasting your time, who knows) parser.c is just a file, it's not brought down as a dependency in any dynamic way. Are you saying that parser.c actually depends back on rust code at some point? If so, I didn't know. I don't think that makes it a non-problem even so? My code, using your crate, has the dependency of tree-sitter = "0.20"

tek commented 2 years ago

do you maybe have an example project to illustrate how this works?

VixieTSQ commented 2 years ago

do you maybe have an example project to illustrate how this works?

You wouldn't need a lot to see issues. Something as simple as this would be UB if there was a breaking change from tree-sitter 0.19 and tree-sitter 0.20

// Depend on these in cargo.toml
use tree-sitter-haskell::language;
use tree-sitter; // version 0.20

fn main() {
    let lang = language(); // Actually an FFI call to C
    // This used to be a type mismatch,
    // but now tree-sitter-haskell assumes what it gets back from C is a version 0.20 `Language`
    let lang2: tree-sitter::Language = lang;
}
tek commented 2 years ago

So if we were to synchronize the version field in Cargo.toml with the version in

[dependencies]
tree-sitter = "0.20"

this should be transparent, right?

I assume the Rust bindings aren't published to some package registry, so you're depending on the repo directly? Not sure what the right approach would be to allow you to select a version except for pinning a commit sha.

VixieTSQ commented 2 years ago

I assume the Rust bindings aren't published to some package registry, so you're depending on the repo directly? Not sure what the right approach would be to allow you to select a version except for pinning a commit sha.

It is from the repo directly, and I'm just using version 0.14 of your crate as that's what your cargo.toml specifies. It seems to work perfectly fine.

So if we were to synchronize the version field in Cargo.toml with the version in

[dependencies]
tree-sitter = "0.20"

this should be transparent, right?

That is what we have now. Both tree-sitter-haskell and my project depend on tree-sitter 0.20 and so there is no compiler error. This does not fix the issue whatsoever with the src/parser.c which your project makes an FFI call to. It's completely unrelated. src/parser.c is still in version 0.19 of tree-sitter, and rust dependency management doesn't do anything with that file and has no knowledge of it.

tek commented 2 years ago

In what way is parser.c bound to version 0.19? does it use an API feature that was broken in 0.20?

VixieTSQ commented 2 years ago

In what way is parser.c bound to version 0.19? does it use an API feature that was broken in 0.20?

I don't have specifics, I only assume it's different. parser.c creates an instance of Language (or something creates that. All I know is that parser.c returns it to rust.) and the definition of the Language might be different than it once was in the previous version of tree-sitter. Rust cannot check if it's different or not because rust isn't C.

Language of tree-sitter 0.20 may have more or less or different fields than Language of tree-sitter 0.19.

tek commented 2 years ago

Well if you're compiling parser.c as part of your project, it will use the Language that's present in the C library that your project depends on. If there is an API discrepancy, it will cause a compile error, but there's no ABI involved.

Do you think I'm missing something? Do you see another version of the tree-sitter lib being pulled in somehow?

VixieTSQ commented 2 years ago

Well if you're compiling parser.c as part of your project, it will use the Language that's present in the C library that your project depends on. If there is an API discrepancy, it will cause a compile error, but there's no ABI involved.

Do you think I'm missing something? Do you see another version of the tree-sitter lib being pulled in somehow?

I kinda thought parser.c defined Language itself. Is that not true? Honestly I don't exactly know what parser.c is exactly, nor do I have much experience with tree-sitter. I'm just doing contributions for a project whom uses it.

tek commented 2 years ago

oh, right, now I get it! There appears to be a struct TSLanguage defined in src/tree_sitter/parser.h. Maybe that's the FFI adapter for Rust's Language? In any case, I don't know what the protocol is for ensuring version consistency here – optimally I guess you'd run tree-sitter generate in your project.

VixieTSQ commented 2 years ago

tree-sitter generate for a binary crate? I'm unsure what the command does exactly. I thought you'd have to look for where ever you pulled all those C files from and get the latest versions of them. Question: what part of this repository did you actually code and isn't just generated? I probably should know this...

tek commented 2 years ago

The javascript files and scanner.c are written manually, everything else is generated. I think it is recommended practice to not commit the generated files, maybe that would force them to be generated for the right version when depended upon?

I think we need some more competent advice, @maxbrunsfeld

VixieTSQ commented 2 years ago

I think it is recommended practice to not commit the generated files, maybe that would force them to be generated for the right version when depended upon?

No that doesn't sound correct. All the other tree-sitter-* repos commit the generated files. At least in rust, you are never expected to hold a dependency locally unless you wrote it. I suppose we'll wait for some that competent advice...

tek commented 2 years ago

Sounds like this issue isn't really specific to this repo!

VixieTSQ commented 2 years ago

Sounds like this issue isn't really specific to this repo!

either way, couldn't the parser.c file just be regenerated now? or why don't you just delete everything you didn't make and regenerate it all. That should bring it all up to date, right? Patches shouldn't have breaking changes so you won't have to regenerate for awhile. It's probably fine, but definitely annoying.

tek commented 2 years ago

The current parser has been generated by tree-sitter 0.20.6. The version in package.json (or Cargo.toml) does not have an effect on that, only what version my system's tree-sitter executable is.

VixieTSQ commented 2 years ago

The current parser has been generated by tree-sitter 0.20.6. The version in package.json (or Cargo.toml) does not have an effect on that, only what version my system's tree-sitter executable is.

then this pull looks fine to merge then?

tek commented 2 years ago

glad we could settle this! :smile: