tree-sitter / tree-sitter-c-sharp

C# Grammar for tree-sitter
MIT License
177 stars 47 forks source link

Rewrite the grammar #333

Closed amaanq closed 1 month ago

amaanq commented 1 month ago

I will update the tokens for publishing the new version later today (needs pypi).

amaanq commented 1 month ago

Wow, those two dynamic precedence issues caught me off guard. I noticed some places where prec.dynamic was used unnecessarily, and I thought those two were as well. That was really interesting and after a little digging, it makes sense. This is probably one of few grammars that is very sensitive to dynamic precedence changes. Thanks for the context, I've added them back

Others:

damieng commented 1 month ago

While I appreciate you've put a lot of effort in I'm also disappointed such a massive PR has landed with zero up-front discussion.

The extra plumbing around packaging and bringing in-line with other grammars is much appreciated but some of these changes would have been a lot easier to review with a bit of thought up-front - e.g. not re-ordering grammar.js where unnecessary and keeping the corpus files in their old location so we can diff (rename/move breaks after a certain amount of change).

Are there regressions with existing parsing? I can see a large list of exclusions on the CI but it's unclear if these are new.

amaanq commented 1 month ago

Well I wasn't aware I had to discuss making large improvements beforehand, nor have I ever done so in any other upstream grammar I maintain.

The moving of tests to a different dir is necessary, top-level corpus dirs are unsupported upstream now and must be in test/corpus.

The exclusion list is either a parse error where it genuinely was an error (beforehand as well) or it has funky usage of preproc ifs, e.g. in the middle of an if/else statement. I have (imo) improved them such that their contents are children of the preproc_if node, much like how C does it, however, this requires that the contents inside be somewhat correctly formed (e.g. a regular statement/expression, wherever the preproc_if is applicable). I think this is a reasonable tradeoff for much more navigable trees.

Tamás politely and graciously pointed out a couple of mistakes regarding some funky dynamic precedence, which I was a little taken aback by as I explained earlier, but those are now fixed.

amaanq commented 1 month ago

@damieng @tamasvajk the diff for tests should be much more readable now

maxbrunsfeld commented 1 month ago

Hope you're able to reduce the state count. In general, a great way to do that is avoid long sequences with many variations.

In other words, turn this:

seq(
  optional('foo'),
  optional('bar'),
  optional('baz'),
  optional('quux'),
  // ...
)

into this:

seq(
  repeat(choice('foo', 'bar', 'baz', 'quux')),
  // ...
)

It seems like there may be opportunities to do some of that in this grammar - there may be unnecessarily-specific sequences that could be modeled more generically.

amaanq commented 1 month ago

Down to 9700/3200 after reworking top level stuff + preprocs 😁

damieng commented 1 month ago

This is shaping up great! If we can get attribute_argument back in then I'm good on the Roslyn alignment/back compat from the naming side.

I think we just need to check/adjust the changes and removals of fields to make sure Semantic doesn't break and we should be good to go.

dcreager commented 1 month ago

I think we just need to check/adjust the changes and removals of fields to make sure Semantic doesn't break and we should be good to go.

Thanks for checking in, :+1: from our side! We're pinned to the current release so we won't silently upgrade, and we're using the syntax highlighting queries directly from this repo, which I see are updated as part of this PR. We have augmented the tagging queries with the ability to build up scoped names, which we'll have to update as part of bumping to the new version containing these changes. But the changes to tags.scm in this PR look manageable, so I don't consider that a blocker.

amaanq commented 1 month ago

thanks for the review/feedback @maxbrunsfeld @damieng @tamasvajk @hvitved!

hvitved commented 1 month ago
  • Improved preproc calls by having content be nested inside these nodes, but this does break the case where it's in the middle of an if statement for example, similar to C. I think the tradeoff is worth it, only ~80 files out of 8k+ files fail because of this, and we get much nicer parse trees in the common case, which, e.g., improves code folding. Another added benefit is we can use preproc rules in spots where only that rule itself is valid, and not everything. e.g. a preproc containing expressions is valid inside expressions, but this won't be done at the top level.

Is there any way, using extras as before this PR, to still tolerate #ifs that happen inside expressions or statements? For example, this file appears to have more parse errors now than before.