tree-sitter / tree-sitter-c-sharp

C# Grammar for tree-sitter
MIT License
177 stars 47 forks source link

Add some more named fields to the grammar #300

Closed tamasvajk closed 1 year ago

tamasvajk commented 1 year ago

This PR adds some more named fields to the grammar. (I couldn't add field('attributes', $.attribute_list) because that significantly increases the parser size.)

Additionally, the PR simplifies the highlighting of types by using the fields named type ((_ type: (identifier) @type)) instead of defining individually for each construct where types are located ((identifier) @type in individual constructs).

tamasvajk commented 1 year ago

Doing the opposite and removing all field() calls from the grammar has significant impact on the parser size:

--- a/script/file_sizes.txt
+++ b/script/file_sizes.txt
@@ -1,5 +1,5 @@
-src/grammar.json       0.2MB        10954
-src/node-types.json    0.1MB         7580
-src/parser.c           47.9MB     1505318
+src/grammar.json       0.2MB        10098
+src/node-types.json    0.1MB         6307
+src/parser.c           38.8MB     1221658
 src/scanner.c          0.0MB           37
-total                  48.3MB     1523889
+total                  39.1MB     1238100
damieng commented 1 year ago

Doing that though will break a lot of the usefulness of the project and break GitHub semantic use no?

tamasvajk commented 1 year ago

Doing that though will break a lot of the usefulness of the project and break GitHub semantic use no?

Yes, sure. I'm not saying we should do this. Only highlighting that field names contribute significantly to the parser size.

damieng commented 1 year ago

Yeah it's weird, I didn't imagine they would add much at all to the size.

tamasvajk commented 1 year ago

Yeah it's weird, I didn't imagine they would add much at all to the size.

It looks like not all field names cause issues: https://github.com/tree-sitter/tree-sitter-c-sharp/pull/314.

tamasvajk commented 1 year ago

@hendrikvanantwerpen Thanks for the review.

Existing fields for expressions use value for the field name. I would suggest sticking with that convention instead of using expression. I feel a bit bad, because when marking them I realized there are many places that needs changing, but I think being consistent is important for long-term maintainability.

No worries that there are many places that need changing, as you said it's important for maintainability. Before I change them, I'll raise one additional concern: I think we're being consistent with the current expression vs value distinctions; we're following what Roslyn is doing. (I would need to check that I fully matched Roslyn everywhere) For example Roslyn uses Value for 6 and Expression for 5 in the below example. See in the Syntax Tree view here.

[A(i = 5)]
abstract void M(int j = 6);
hendrikvanantwerpen commented 1 year ago

@tamasvajk Perhaps I judged too quickly? If this grammar explicitly tries to follow Roslyn (or even de facto did until now) I am all for aligning with them!

In that case, I would even consider changing fields to align with them, if it's only a few fields that deviate.

damieng commented 1 year ago

The only reason I followed the Roslyn grammar was to make changes in future C# easier as we can follow what rules change. (It's been pretty useful for C# 9, 10, 11)

If however that's getting in the way of size/usability we should deviate.

hendrikvanantwerpen commented 1 year ago

I agree that it is valuable to follow the Roslyn grammar of feasible. Perhaps it makes sense to call this out in the readme, and make it a bit more official? Wording along @damieng's comment above should be fine, I think

tamasvajk commented 1 year ago

Thanks for all the feedback on this PR. I think it might be easier if we do this step-by-step. In the first step I'm introducing only the fields that have visible impact on some use-cases, such as highlighting. So I'm leaving the fields with field('type'. I also renamed field('element_type' to field('type', which actually results in some highlighting improvement: https://github.com/tree-sitter/tree-sitter-c-sharp/pull/316

tamasvajk commented 1 year ago

Here's another followup PR to add fields to name-like constructs: https://github.com/tree-sitter/tree-sitter-c-sharp/pull/318