Closed tamasvajk closed 1 year ago
Doing the opposite and removing all field()
calls from the grammar has significant impact on the parser size:
--- a/script/file_sizes.txt
+++ b/script/file_sizes.txt
@@ -1,5 +1,5 @@
-src/grammar.json 0.2MB 10954
-src/node-types.json 0.1MB 7580
-src/parser.c 47.9MB 1505318
+src/grammar.json 0.2MB 10098
+src/node-types.json 0.1MB 6307
+src/parser.c 38.8MB 1221658
src/scanner.c 0.0MB 37
-total 48.3MB 1523889
+total 39.1MB 1238100
Doing that though will break a lot of the usefulness of the project and break GitHub semantic use no?
Doing that though will break a lot of the usefulness of the project and break GitHub semantic use no?
Yes, sure. I'm not saying we should do this. Only highlighting that field names contribute significantly to the parser size.
Yeah it's weird, I didn't imagine they would add much at all to the size.
Yeah it's weird, I didn't imagine they would add much at all to the size.
It looks like not all field names cause issues: https://github.com/tree-sitter/tree-sitter-c-sharp/pull/314.
@hendrikvanantwerpen Thanks for the review.
Existing fields for expressions use value for the field name. I would suggest sticking with that convention instead of using expression. I feel a bit bad, because when marking them I realized there are many places that needs changing, but I think being consistent is important for long-term maintainability.
No worries that there are many places that need changing, as you said it's important for maintainability. Before I change them, I'll raise one additional concern: I think we're being consistent with the current expression
vs value
distinctions; we're following what Roslyn is doing. (I would need to check that I fully matched Roslyn everywhere) For example Roslyn uses Value
for 6
and Expression
for 5 in the below example. See in the Syntax Tree view here.
[A(i = 5)]
abstract void M(int j = 6);
@tamasvajk Perhaps I judged too quickly? If this grammar explicitly tries to follow Roslyn (or even de facto did until now) I am all for aligning with them!
In that case, I would even consider changing fields to align with them, if it's only a few fields that deviate.
The only reason I followed the Roslyn grammar was to make changes in future C# easier as we can follow what rules change. (It's been pretty useful for C# 9, 10, 11)
If however that's getting in the way of size/usability we should deviate.
I agree that it is valuable to follow the Roslyn grammar of feasible. Perhaps it makes sense to call this out in the readme, and make it a bit more official? Wording along @damieng's comment above should be fine, I think
Thanks for all the feedback on this PR. I think it might be easier if we do this step-by-step. In the first step I'm introducing only the fields that have visible impact on some use-cases, such as highlighting. So I'm leaving the fields with field('type'
. I also renamed field('element_type'
to field('type'
, which actually results in some highlighting improvement: https://github.com/tree-sitter/tree-sitter-c-sharp/pull/316
Here's another followup PR to add fields to name-like constructs: https://github.com/tree-sitter/tree-sitter-c-sharp/pull/318
This PR adds some more named fields to the grammar. (I couldn't add
field('attributes', $.attribute_list)
because that significantly increases the parser size.)Additionally, the PR simplifies the highlighting of types by using the fields named
type
((_ type: (identifier) @type)
) instead of defining individually for each construct where types are located ((identifier) @type
in individual constructs).