tree-sitter / tree-sitter-c

C grammar for tree-sitter
MIT License
225 stars 100 forks source link

fix: allow preprocessor directives in an `enum_specifier` #182

Closed jcreekmore closed 7 months ago

jcreekmore commented 8 months ago

Preprocessor directives inside of an enum_specifier variant list cause ERROR nodes to appear in the syntax tree. Both the struct_specifier and the union_specifier allow the preprocessor directives to exist to selectively enable or disable fields in the struct or union. It is fairly common in C to also selectively add variants to an enum in the same manner.

This modifies the grammar for the enumerator_list so that the preprocessor directives are allowed. My initial approach was to just make the commas optional at the end of the enumerator and, while that did allow a simple _enumerator_list_item in much the same style as the _field_declaration_list_item, it was a change in behavior since it did not require a comma in between every enum variant. By splitting the _enumerator_list_item into a version with a comma required and an optional _enumerator_list_item_end that does NOT require a comma, this keeps the same behvior of requiring the comma separated list of variants without requiring a comma at the end of the list of variants. Additionally, it does not require a comma at the end of the preprocessor directives, which would not be correct either. The conflict additions were required to get the grammar to compile with the two different versions of enumerator_list_item.

Additionally, the modifications to the grammar introduced some ambiguity in the precedence of the concatenated_string, so right precedence was remove that ambiguity.

Fixes #181

jcreekmore commented 8 months ago

The AppVeyor failure appears to be completely unrelated to the changes in this PR (looks like some Python/Node environmental issue).

While the PR diff looks terrible, the actual changes I made were in grammar.js and test/corpus/preprocessor.txt. The rest of the changes are from running tree-sitter generate with the latest master tree-sitter cli.

amaanq commented 7 months ago

Hey, thanks for the PR. I tweaked it a bit since the conflicts and additions were a bit excessive - I think only preproc ifs and ifdefs should be here, despite technically defines and other stuff being allowed, for the purpose of not increasing the state count too much. This is now fixed on master, thanks!

jcreekmore commented 7 months ago

So, I agree with you on the preprocessor defines and function defines. However, the preproc_call are also part of what is tripping me up (I should have included that in the test, but I didn't). The code I am trying to parse has both ifdefs and calls because it is trying to build up the enumeration piece-meal something like this:

#define FOO_BUILDER(sym) FOO_##sym

enum foo {
#ifdef A
    FOO_BUILDER(BAR),
#else
    FOO_BUILDER(MAX),
#endif
    FOO_END,
};

That is also throwing the error nodes (it is kind of a dumb example, but a simplified version of what I am seeing in the codebase). The following diff (still smaller than the full PR branch I had) resolves it:

diff --git a/grammar.js b/grammar.js
index d223f35..881406e 100644
--- a/grammar.js
+++ b/grammar.js
@@ -67,6 +67,7 @@ module.exports = grammar({
     [$.enum_specifier],
     [$._type_specifier, $._old_style_parameter_list],
     [$.parameter_list, $._old_style_parameter_list],
+    [$.enumerator_list],
   ],

   word: $ => $.identifier,
@@ -622,12 +623,14 @@ module.exports = grammar({
       '{',
       repeat(choice(
         seq($.enumerator, ','),
+        $.preproc_call,
         alias($.preproc_if_in_enumerator_list, $.preproc_if),
         alias($.preproc_ifdef_in_enumerator_list, $.preproc_ifdef),
       )),
       optional(seq(
         choice(
           $.enumerator,
+          $.preproc_call,
           alias($.preproc_if_in_enumerator_list_no_comma, $.preproc_if),
           alias($.preproc_ifdef_in_enumerator_list_no_comma, $.preproc_ifdef),
         ),
amaanq commented 7 months ago

ah my bad, I should've added calls as well.

Can we go with seq($.preproc_call, ',') for the first instance? I hope usage of this in macros won't be adding the comma to it, and that avoids the conflict

jcreekmore commented 7 months ago

Yeah, doing seq($.preproc_call, ',') for the first instance allowed me to remove the conflict. Do you want me to spin up another PR with that change, or do you want to just do it?

amaanq commented 7 months ago

done