lezer-parser / cpp

C++ grammar for the Lezer parser system
MIT License
19 stars 4 forks source link

wrong affinity of pointer-star in function type #3

Closed milahu closed 1 year ago

milahu commented 1 year ago

source

void* some_function() {}

actual parse tree

FunctionDefinition: "void* some_function() {"
  PrimitiveType: "void"
  PointerDeclarator: "* some_function()"
    FunctionDeclarator: "some_function()"
      Identifier: "some_function"
      ParameterList: "()"
        (: "("
        ): ")"
  CompoundStatement: "{"
    {: "{"
    }: "}"

actual affinity: void (* some_function) expected affinity: (void *) some_function

actual vs expected parse tree

 FunctionDefinition: "void* some_function() {"
-  PrimitiveType: "void"
-  PointerDeclarator: "*some_function()"
-    FunctionDeclarator: "some_function()"
+  CompoundType: "void*"
+    PrimitiveType: "void"
+    PointerStar: "*"
+  FunctionDeclarator: "some_function()"
       Identifier: "some_function"
       ParameterList: "()"
         (: "("

same bug in tree-sitter-cpp, see playground

expected parse tree per clang

{
  "kind": "FunctionDecl",
  "name": "some_function",
  "type": {
    "qualType": "void *()"
  },
  "inner": [
    {
      "kind": "CompoundStmt",
    }
  ]
}
marijnh commented 1 year ago

This BNF grammar also seems to group the * under the declarator, so I'm not sure this is a bug as such.

milahu commented 1 year ago

then that grammar is wrong too. other grammars:

for my case (cpp2js transpiler) i also care about semantics, so i use the clang parser as reference

parsing the example code void* some_function() {} as "pointer declarator" does not make sense

an actual "pointer declarator" should be called "function pointer declarator" declaration and call need parens for (*function_name)

#include <stdio.h> // printf
#include <stdlib.h> // malloc

void f1(int x) {
  printf("f1: x = %i\n", x);
}

void f2(int x) {
  printf("f2: x = %i\n", x);
}

struct s1 {
  // declaration
  void (*f1_ptr)(int x);
  void (*f2_ptr)(int x);
};

int main(int argc, char** argv) {
  int x = argc;
  struct s1 *s1_ptr = malloc(sizeof(struct s1));

  void (*f1_ptr)(int x) = &f1; // declaration + init
  (*f1_ptr)(x); // call
  x++;

  // arrow syntax
  s1_ptr->f1_ptr = &f1; // init
  (*s1_ptr->f1_ptr)(x); // call
  x++;

  // dot syntax
  (*s1_ptr).f2_ptr = &f2; // init
  (*(*s1_ptr).f2_ptr)(x); // call
  x++;

  free(s1_ptr);
  return 0;
}
marijnh commented 1 year ago

then that grammar is wrong too.

That seems like a very blunt dismissal of a generally respected resource. I don't see the structure you are proposing in https://github.com/antlr/grammars-v4/tree/master/cpp either. C++ is extremely weird to parse, so the parse tree and the abstract syntax tree in Clang differing is not terribly surprising. I don't have the expertise on this language to make a definite call here, but I am not prepared to merge this change without a much more solid rationale for it.

milahu commented 1 year ago

C++ is extremely weird to parse

these are simple function-pointers in C

That seems like a very blunt dismissal

thats because im 100% sure (edit: at least im sure that cpp.lezer is wrong here)

if you remove the parens around (*funcion_name) in my example

 struct s1 {
   // declaration
   void (*f1_ptr)(int x);
-  void (*f2_ptr)(int x);
+  void* f2_ptr(int x);
 };

 int main(int argc, char** argv) {

then the pointer-star is left-associative, and compilation fails:

bad.c:15:9: error: field ‘f2_ptr’ declared as a function
   15 |   void* f2_ptr(int x);
      |         ^~~~~~

declared as a function

is an error, because in a struct C expects a function-pointer

so ...

problem: pointer-star is right-associative expected: pointer-star is left-associative

marijnh commented 1 year ago

I'm going to decline this until you can provide a (somewhat authoritative) grammar that works this way, on the assumption that deviating from the grammars we were working from is certainly going to cause obscure regressions.

milahu commented 5 months ago

your decision was correct, also confirmed by https://github.com/tree-sitter/tree-sitter-c/issues/129#issuecomment-2113033597

int* x, y;

In C, this declares a pointer (x) to an int, and an int (y), i.e. the "star" only apply to x.

apparently clang also does some semantic processing, not just parsing