semgrep / ocaml-tree-sitter-semgrep

Generate parsers from tree-sitter grammars extended to support Semgrep patterns
GNU General Public License v3.0
4 stars 10 forks source link

Upgrade to tree-sitter 0.22.6 #485

Closed mjambon closed 5 months ago

mjambon commented 5 months ago

Uses https://github.com/semgrep/ocaml-tree-sitter-core/pull/77

This is the long list of parsers that can't build with tree-sitter 0.22.6. Most of them only need an upgrade of the tree-sitter-xxx submodule:

*** Failed to build or test the following languages: bash c-sharp dart elixir fsharp hack hcl html java javascript lua php python r ruby rust sfapex sml solidity typescript vue

Only c-sharp and hack are known to fail permanently due to memory exhaustion during tree-sitter generate.

Security

aryx commented 5 months ago

fails in CI

mjambon commented 5 months ago

@aryx yes, it's expected to fail for most languages. Their tree-sitter-xxx submodule needs to be upgraded and I don't want to do it without doing the OCaml integration work.

mjambon commented 5 months ago

So what is the plan?

See internal note: https://semgrepinc.slack.com/archives/C048LGPK46L/p1718771440113229

Who is gonna update the failing languages?

Whoever needs to generate a parser for a language.

The problem is that upgrading a parser requires OCaml integration work done in semgrep and I'm not willing to do that (unless the parser is unused). I'll look into upgrading the unused parsers.

Normally, the semgrep developer working on a parser would upgrade to the latest tree-sitter parser but if they want to only extend something like how semgrep's metavariables are parsed without upgrading, they can do so by continuing to use the older tree-sitter (0.20.6).

Should we merge this before we fix all the failing languages?

Yes, because this doesn't affect the semgrep repo.