tree-sitter-grammars / tree-sitter-hcl

HCL grammar for tree-sitter
https://tree-sitter-grammars.github.io/tree-sitter-hcl/
Apache License 2.0
92 stars 20 forks source link

feat: rewrite the scanner in C #35

Closed amaanq closed 1 year ago

MichaHoffmann commented 1 year ago

Hey, thanks for the pull request. I wonder why it is needed though, is something wrong with the C++ scanner other than it being a bit ugly?

amaanq commented 1 year ago

It's not that it's ugly (well, subjectively, C++ is ugly to me :laughing:), but that a pure C scanner has better support for pure C projects and is easier to work with with regards to compilation/bundling. I can add a fuzz action here too that will fuzz the code for any bugs like buffer overflows, leaks, segfaults, etc

MichaHoffmann commented 1 year ago

Hey,

One problem i'm seeing is that I think this is a breaking change, downstream projects like nvim-treesitter use the name of the external scanner to compile the parser ( see here https://github.com/nvim-treesitter/nvim-treesitter/blob/ae0415331483bd143f80c186401fb2aa783f33df/lua/nvim-treesitter/parsers.lua#L639 ); due to that i dont feel really comfortable merging this PR yet.

amaanq commented 1 year ago

Well I am a member of nvim-treesitter and I would adapt the changes right away :smile:

MichaHoffmann commented 1 year ago

Problem is; the parser is also used in semgrep and other projects; i really dont want to break them.

amaanq commented 1 year ago

I'm sorry but I don't understand that logic - python, html, bash, c++, and others all were rewritten in C with no complaints from upstream maintainers and are all probably used by more projects than this

MichaHoffmann commented 1 year ago

I'm sorry but I don't understand that logic - python, html, bash, c++, and others all were rewritten in C with no complaints from upstream maintainers and are all probably used by more projects than this

Semgrep for example has not yet updated to the C one ~ https://github.com/tree-sitter/tree-sitter-cpp/blob/a35a275df92e7583df38f2de2562361f2b69987e/src/scanner.cc ( if its the correct grammar )

amaanq commented 1 year ago

I'm afraid that's Semgrep's issue then, as they won't update any fixes/features I or you have made (like one nasty infinite loop bug in Python), and thus shouldn't have any bearing here

MichaHoffmann commented 1 year ago

I wonder, is there a way we can have both scanners? ( the code is pretty much set i would say ); that way i can take a couple weeks to update all downstream projects i know of ( because i got complains about breaking compiling )

MichaHoffmann commented 1 year ago

generally they should be able to just exist in parallel, right?

amaanq commented 1 year ago

I mean..you could but I don't know how cli picks the scanner to compile

rust/binding.gyp just need the file renamed to ideally prefer the C version.

MichaHoffmann commented 1 year ago

lets maybe just do this: merge and not release it for a bit and create PRs to bump + fix compiling issues in projects that i know about; does that sound acceptable?

amaanq commented 1 year ago

That's totally fine! Thanks for coming to a compromise :slightly_smiling_face:

MichaHoffmann commented 1 year ago

Ill break out the old fuzzer today in the evening; if that reveals no issues i feel comfortable merging!

amaanq commented 1 year ago

Just marked a few more functions static that weren't