llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.63k stars 11.83k forks source link

libclang python: Missing VAR_DECL initializer if NULL present #54626

Open japendergrass opened 2 years ago

japendergrass commented 2 years ago

The libclang python bindings are consistently not returning the initializer child cursor for VAR_DECL cursors if a NULL is present (apparently) anywhere in the initializer. This includes initializing a value to NULL, to an array with at least one NULL element, or to a structure with a NULL field (and seems to apply when these are nested as well).

Using ((void*)0) instead of NULL does not produce this issue, nor does using 0.

The following demo.py (apologies for not attaching, the github page keeps producing an error when I try) illustrates the issue by dumping the children of a my_global variable on several sample C files.

$ for f in *.c; do python ./demo.py $f ; done
test1-bad.c: []
test1-good.c: [CursorKind.UNEXPOSED_EXPR]
test2-bad.c: []
test2-good.c: [CursorKind.INIT_LIST_EXPR]
test3-bad.c: [CursorKind.TYPE_REF]
test3-good.c: [CursorKind.TYPE_REF, CursorKind.INIT_LIST_EXPR]

Tested with version 12.0.0 and 13.0.0 of the libclang python bindings on Ubuntu 20.04

demo.py

import sys
from clang.cindex import Index
idx  = Index.create()
tu   = idx.parse(sys.argv[1])
crsr = [c for c in tu.cursor.get_children() if c.spelling == "my_global"][0]
print("%s: %s" % (sys.argv[1], [child.kind for child in crsr.get_children()]))

test1-bad.c

#include <stddef.h>
char *my_global = NULL;

Output is the empty list but should be [CursorKind.UNEXPOSED_EXPR]

test1-good.c

#include <stddef.h>
char *my_global = 0;

test2-bad.c

#include <stddef.h>
char *my_global[] = {"hello", NULL};

Output is the empty list but should be [CursorKind.INIT_LIST_EXPR]

test2-good.c

#include <stddef.h>
char *my_global[] = {"hello", 0};

test3-bad.c

#include <stddef.h>
struct test {char *a; int b;};
struct test my_global[] = {{"hello", 7}, {NULL, 3}};

Output is [CursorKind.TYPE_REF] but should be [CursorKind.TYPE_REF, CursorKind.INIT_LIST_EXPR]

test3-good.c

#include <stddef.h>
struct test {char *a; int b;};
struct test my_global[] = {{"hello", 7}, {0, 3}};
japendergrass commented 2 years ago

This appears to impact get_children for a wide range of AST node types that may contain NULL as a subexpression. For example:

int foo(){return 0 == NULL}
int foo(char *x){while(x == NULL){x = "hello";}

Both exhibit similar behavior where the COMPOUND_STMT body of the function reports no children.

japendergrass commented 2 years ago

This does not seem to affect the version of the underlying libclang.so installed by the ubuntu package manager. Just the version installed by pip install libclang