posit-dev / ark

Ark, an R kernel
MIT License
176 stars 14 forks source link

Bump tree-sitter-r 4 - Syntax vs Semantic Diagnostics #523

Closed DavisVaughan closed 1 month ago

DavisVaughan commented 2 months ago

Addresses https://github.com/posit-dev/positron/issues/2943

This PR also contains https://github.com/posit-dev/ark/pull/529 which was merged into it

Pulls in a whopping 34 more commits from tree-sitter-r https://github.com/r-lib/tree-sitter-r/compare/63ee9b10de3b1e4dfaf40e36b45e9ae3c9ed8a4f...99bf614d9d7e6ac9c7445fa7dc54a590fcdf3ce0. It's really only 3 main changes. We pull in all 3 at once because they are a bit intermingled with each other.

With these 3 changes, I was able to greatly improve our diagnostics engine.

It has been split into two parts - a syntax path, and a semantic path:

In a future PR I'll do a mostly pure "rearrangement" PR to clean this structure up a bit more. I haven't done that yet to make it clear what has been moved out of diagnostics.rs.

Syntax diagnostics

Diagnostics based purely on ERROR and MISSING nodes in the tree-sitter AST.

A few more improvements in this realm:

#> ── Text ────────────────────────────────────────────────────────────────────────
#> 1 + }
#> 
#> ── S-Expression ────────────────────────────────────────────────────────────────
#> (program [(0, 0), (0, 5)]
#>   (float [(0, 0), (0, 1)])
#>   (ERROR [(0, 2), (0, 5)]
#>     "+" [(0, 2), (0, 3)]
#>     (ERROR [(0, 4), (0, 5)])
#>   )
#> )

Semantic diagnostics

Semantic diagnostics are now run in a separate path from syntax diagnostics, this has the following really nice benefit - we only run semantic diagnostics on top level expressions (i.e. children of root) that node.has_error() returns false for. In other words, we only consider running semantic diagnostics down a section of the tree if we know that section of the tree does not contain any syntax errors.

This actually works quite nicely in practice.

Improvement examples

This shows the "truncation" idea once a syntax error spans >20 rows

https://github.com/user-attachments/assets/cb3d80cc-0712-4294-ac07-2f2b6fa0128d

This shows improvements in the example in https://github.com/posit-dev/positron/issues/2943, which used to light up the whole file

https://github.com/user-attachments/assets/bd6fab72-888b-492e-9289-f84c5b4ed124

This is a tree-sitter-r test file, with many syntax errors. Note that 1) it doesn't light up the whole file and 2) it still shows some semantic issues too (symbol not found errors)

https://github.com/user-attachments/assets/40cc5fca-e87d-49cd-9709-eb34a97181df

Improvements on the example from https://github.com/posit-dev/positron/discussions/4177. We often target the missing opening/closing node now. At the very end it shows that } doesn't have a matching opening {, and I do think that is still a technically correct syntax error message, even if we'd like to show the unmatched ( (Positron does at least highlight that unmatched ( in red here, which is nice)

https://github.com/user-attachments/assets/b622440a-dc48-47ba-94a0-627d7815fe28