tree-sitter / tree-sitter-c-sharp

C# Grammar for tree-sitter
MIT License
177 stars 47 forks source link

Parse error classes with primary constructors (C#12, Dotnet 8 LTS) #329

Closed neuromagus closed 2 months ago

neuromagus commented 2 months ago

https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/tutorials/primary-constructors

Tree-sitter did not understand record class with primary constructor syntax. Perhaps this behavior extends to structures.

20240409_01h13m21s_grim

P.S. Please, don't ask me about theme, I try to config nvim ;} How to get out?

neuromagus commented 2 months ago

I understand for this

C# 12.0 (under development)

but all plugins, which use tree-sitter (Csharp I mean) work with UB. For example, this et cetera... (The Emacs universe - is another question). I read the code a little, and... Guys!.. C#, complicated by syntactic sugar compared to Java, but parser.c in 42Mb and 2.5Mb? So Csharp syntax is 16 times heavier?

I understand, DSL, Dotnet updates every year. If u have any instructions for learn this DSL and/or normal realized examples - i'm in. I have a time to help. IMHO, maybe it’s worth creating a coherent structure and describing it in the documentation? The language is in good shape, developing well, transition to FP is underway, syntax and new constructions will be added constantly.

P.S. Sorry for my English.

tamasvajk commented 2 months ago

https://github.com/tree-sitter/tree-sitter-c-sharp/issues/273#issuecomment-1378967268 references a commit that increased the parser size significantly.

neuromagus commented 2 months ago

well, I ask AI(to me a completely new subject), and

this is an AI response to your post

The conflict arises because the parser generator is unable to determine whether to interpret the sequence '*' _lvalue_expression '=' as a pointer indirection expression followed by an assignment, or as the start of an assignment expression with a dereferenced lvalue.

Here's the breakdown of the conflict:

  1. The sequence starts with '*', which could be the start of a pointer indirection expression (_pointer_indirection_expression).
  2. After '*', there is an _lvalue_expression, which is a valid continuation of both a pointer indirection expression and an assignment expression.
  3. The next token is '=', which creates the ambiguity:
    • It could be interpreted as the start of an assignment operator within an assignment expression (assignment_expression).
    • Or, it could be treated as a separate assignment operator following a complete pointer indirection expression.

The parser generator is not able to automatically resolve this ambiguity based on the precedence rules alone. It needs additional information to decide how to interpret the sequence.

Regarding your question about why '*' _lvalue_expression isn't reduced to _pointer_indirection_expression due to its higher precedence, it's because the parser generator doesn't have enough lookahead information at that point to make the decision. It needs to consider the token following the _lvalue_expression to determine whether it should reduce to _pointer_indirection_expression or continue parsing an assignment expression.

Blah-blah... Blah-blah...

In summary, the conflict arises due to the ambiguity in interpreting the sequence '*' _lvalue_expression '=', and the parser generator needs additional information or explicit conflict resolution to determine the correct parse.


Well, I wonder how many other places like this are there in grammar.js? :(