tree-sitter / tree-sitter-c

C grammar for tree-sitter
MIT License
221 stars 102 forks source link

Handling of preprocessor macros is not general enough #108

Open bbannier opened 2 years ago

bbannier commented 2 years ago

While looking into how one could tackle zeek/tree-sitter-zeek#6 I looked into this grammar for inspiration and noticed that it has similar issues. In C or C++ preprocessor macros can appear around pretty much any token of the language while this grammar only allows for it in a couple of places. I wonder what the best approach to this would be.

As an example, the following source file

int
#if 0
foo
#else
main
#endif
(void) {}

produces this AST

(translation_unit
  (ERROR
    (primitive_type))
  (preproc_if
    (number_literal)
    (ERROR
      (identifier))
    (preproc_else)
    (ERROR
      (identifier)))
  (expression_statement
    (compound_literal_expression
      (type_descriptor

One could come up with nastier examples where e.g., an opening parenthesis is inside a preprocessor block. I am not even sure how the resulting AST should look like, but I feel like I might want something which can support preprocessor directives anywhere, but with more structure than what is extras is typically used for. Would there be a way to support this with an external scanner?

There is also already #13, but it seems to be more focussed on improving the the handling of currently supported special cases.

tr-intel commented 4 months ago

Here's a typical scenario where we come across this problem.

#ifdef __cplusplus
extern "C" {
#endif

#ifdef __cplusplus
}
#endif

AST: https://tree-sitter.github.io/tree-sitter/playground#

translation_unit [0, 0] - [8, 0]
  preproc_ifdef [0, 0] - [6, 6]
    name: identifier [0, 7] - [0, 18]
    linkage_specification [1, 0] - [5, 1]
      value: string_literal [1, 7] - [1, 10]
        string_content [1, 8] - [1, 9]
      body: declaration_list [1, 11] - [5, 1]
        preproc_call [2, 0] - [3, 0] <<<<<<<<< 🧐
          directive: preproc_directive [2, 0] - [2, 6]
        preproc_ifdef [4, 0] - [4, 18]
          name: identifier [4, 7] - [4, 18]
          MISSING #endif [4, 18] - [4, 18]  <<<<<<<<< 🧐