tree-sitter / tree-sitter-c

C grammar for tree-sitter
MIT License
225 stars 100 forks source link

bug: Comma operator in conditional_expression #187

Closed mingodad closed 6 months ago

mingodad commented 7 months ago

Did you check existing issues?

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

tree-sitter 0.20.9 (df1fe842eb08232f4119deba7aae973b7782530d)

Describe the bug

While testing this parser with sqlite3.c preprocessed I found this problem and also this related issue https://github.com/tree-sitter/tree-sitter-c/issues/31 and following it's solution I changed the grammar like shown bellow and got it to parse (probably the same apply to several other places).

---------------------------------- grammar.js ----------------------------------
index 2a9fe20..2f64cc9 100644
@@ -930,7 +930,7 @@ module.exports = grammar({
     conditional_expression: $ => prec.right(PREC.CONDITIONAL, seq(
       field('condition', $._expression),
       '?',
-      optional(field('consequence', $._expression)),
+      optional(field('consequence', choice($._expression, $.comma_expression))),
       ':',
       field('alternative', $._expression),
     )),

Steps To Reproduce/Bad Parse Tree

tree-sitter parse test-cond.c

Expected Behavior/Parse Tree

(translation_unit [0, 0] - [1, 0] (expression_statement [0, 0] - [0, 20] (assignment_expression [0, 0] - [0, 19] left: (identifier [0, 0] - [0, 1]) right: (conditional_expression [0, 4] - [0, 19] condition: (identifier [0, 4] - [0, 7]) consequence: (comma_expression [0, 10] - [0, 15] left: (assignment_expression [0, 10] - [0, 13] left: (identifier [0, 10] - [0, 11]) right: (number_literal [0, 12] - [0, 13])) right: (number_literal [0, 14] - [0, 15])) alternative: (number_literal [0, 18] - [0, 19])))))

Repro

//From sqlite3
//nHeader += (u8)(((u32)(nPayload)<(u32)0x80)?(*(&pCell[nHeader])=(unsigned char)(nPayload)),1: sqlite3PutVarint((&pCell[nHeader]),(nPayload)));
a = val ? b=3,1 : 0;
mingodad commented 7 months ago

Also to be able to parse preprocessed sqlite3.c we need this https://github.com/tree-sitter/tree-sitter-c/issues/183#issuecomment-1933809959 and also something like shown bellow for when using gcc to preprocess:

---------------------------------- grammar.js ----------------------------------
index 2a9fe20..28c2e73 100644
@@ -743,6 +743,7 @@ module.exports = grammar({
       $.while_statement,
       $.for_statement,
       $.return_statement,
+      $.fallthrough_statement,
       $.break_statement,
       $.continue_statement,
       $.goto_statement,
@@ -852,6 +853,10 @@ module.exports = grammar({
       ';',
     ),

+    fallthrough_statement: _ => seq(
+      '__attribute__((fallthrough))', ';',
+    ),
+
     break_statement: _ => seq(
       'break', ';',
     ),
mingodad commented 7 months ago

With all the above changes the remaining problem to parse preprocessed sqlite3.c is this:

extern int _IO_vfprintf (_IO_FILE *__restrict, const char *__restrict,
    __gnuc_va_list);

And it seems that is something related with __restrict because if we replace __restrict by restrict then it parses:

int vs(const char *__restrict, va_list);
[translation_unit](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [1, 0]
  [declaration](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [0, 40]
    type: [primitive_type](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [0, 3]
    declarator: [function_declarator](https://tree-sitter.github.io/tree-sitter/playground#) [0, 4] - [0, 39]
      declarator: [identifier](https://tree-sitter.github.io/tree-sitter/playground#) [0, 4] - [0, 6]
      parameters: [parameter_list](https://tree-sitter.github.io/tree-sitter/playground#) [0, 6] - [0, 39]
        [parameter_declaration](https://tree-sitter.github.io/tree-sitter/playground#) [0, 7] - [0, 38]
          [type_qualifier](https://tree-sitter.github.io/tree-sitter/playground#) [0, 7] - [0, 12]
          type: [primitive_type](https://tree-sitter.github.io/tree-sitter/playground#) [0, 13] - [0, 17]
          declarator: [pointer_declarator](https://tree-sitter.github.io/tree-sitter/playground#) [0, 18] - [0, 38]
            [ms_pointer_modifier](https://tree-sitter.github.io/tree-sitter/playground#) [0, 19] - [0, 29]
              [ms_restrict_modifier](https://tree-sitter.github.io/tree-sitter/playground#) [0, 19] - [0, 29]
            [ERROR](https://tree-sitter.github.io/tree-sitter/playground#) [0, 29] - [0, 30]
            declarator: [identifier](https://tree-sitter.github.io/tree-sitter/playground#) [0, 31] - [0, 38]
int vs(const char *restrict, va_list);
[translation_unit](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [1, 0]
  [declaration](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [0, 38]
    type: [primitive_type](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [0, 3]
    declarator: [function_declarator](https://tree-sitter.github.io/tree-sitter/playground#) [0, 4] - [0, 37]
      declarator: [identifier](https://tree-sitter.github.io/tree-sitter/playground#) [0, 4] - [0, 6]
      parameters: [parameter_list](https://tree-sitter.github.io/tree-sitter/playground#) [0, 6] - [0, 37]
        [parameter_declaration](https://tree-sitter.github.io/tree-sitter/playground#) [0, 7] - [0, 27]
          [type_qualifier](https://tree-sitter.github.io/tree-sitter/playground#) [0, 7] - [0, 12]
          type: [primitive_type](https://tree-sitter.github.io/tree-sitter/playground#) [0, 13] - [0, 17]
          declarator: [abstract_pointer_declarator](https://tree-sitter.github.io/tree-sitter/playground#) [0, 18] - [0, 27]
            [type_qualifier](https://tree-sitter.github.io/tree-sitter/playground#) [0, 19] - [0, 27]
        [parameter_declaration](https://tree-sitter.github.io/tree-sitter/playground#) [0, 29] - [0, 36]
          type: [type_identifier](https://tree-sitter.github.io/tree-sitter/playground#) [0, 29] - [0, 36]

Grammar:

...
    ms_restrict_modifier: _ => '__restrict',

    ms_unsigned_ptr_modifier: _ => '__uptr',

    ms_signed_ptr_modifier: _ => '__sptr',

    ms_unaligned_ptr_modifier: _ => choice('_unaligned', '__unaligned'),

    ms_pointer_modifier: $ => choice(
      $.ms_unaligned_ptr_modifier,
      $.ms_restrict_modifier,
      $.ms_unsigned_ptr_modifier,
      $.ms_signed_ptr_modifier,
    ),
...
    type_qualifier: _ => choice(
      'const',
      'constexpr',
      'volatile',
      'restrict',
      '__restrict__',
      '__extension__',
      '_Atomic',
      '_Noreturn',
      'noreturn',
    ),
...
mingodad commented 7 months ago

Also the preproc_line is missing all of the above was preprocessed with -P to omit emit line info. This parser doesn't recognize preproc_line

# 40 "/usr/lib/gcc/x86_64-linux-gnu/9/include/stdarg.h" 3 4
typedef __builtin_va_list __gnuc_va_list;
# 99 "/usr/lib/gcc/x86_64-linux-gnu/9/include/stdarg.h" 3 4
typedef __gnuc_va_list va_list;
# 346 "sqlite3.c" 2
# 495 "sqlite3.c"
mingodad commented 7 months ago

I found the problem with __restrict, it's a missing repeat($.ms_pointer_modifier), in abstract_pointer_declarator see diff bellow.

---------------------------------- grammar.js ----------------------------------
index 2a9fe20..4b1a691 100644
@@ -441,6 +441,7 @@ module.exports = grammar({
       field('declarator', $._type_declarator),
     ))),
     abstract_pointer_declarator: $ => prec.dynamic(1, prec.right(seq('*',
+      repeat($.ms_pointer_modifier),
       repeat($.type_qualifier),
       field('declarator', optional($._abstract_declarator)),
     ))),
mingodad commented 7 months ago

Also adding a rule for linemark allow it to parse preprocessed files with linemarks (see https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html):

---------------------------------- grammar.js ----------------------------------
index 2a9fe20..5de719e 100644
@@ -40,6 +40,7 @@ module.exports = grammar({
   extras: $ => [
     /\s|\\\r?\n/,
     $.comment,
+    $.preproc_linemark,
   ],

   inline: $ => [
@@ -110,6 +111,32 @@ module.exports = grammar({
     ),

     // Preprocesser
+
+    preproc_linemark: $ =>
+       /# \d+ "(\\.|[^\\"\r\n\\])+"( [1234]){0,4}\r?\n/
+    ,
+
     preproc_include: $ => seq(
       preprocessor('include'),
amaanq commented 6 months ago

I don't think we want to consider preprocessed code files, that's just asking to handle too much with compiler intrinsics, line markers, etc