llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.63k stars 11.83k forks source link

libclang (via Python bindings), is_definition() is different for structs on x84_64 and aarch64 #20133

Open llvmbot opened 10 years ago

llvmbot commented 10 years ago
Bugzilla Link 19759
Version 3.4
OS other
Reporter LLVM Bugzilla Contributor

Extended Description

We have a Python script using libclang. It walks over AST to collect information about structures. I am porting code to aarch64 (ARMv8) and found that script does not produce any content. This is because is_definition() return value is different between x86_64 and aarch64.

Example:

/usr/include/time.h

Fedora 19, AArch64 (ARMv8)

[2014-05-15 16:03:42,980] DEBUG: Child: <clang.cindex.Cursor object at 0x17e993b0> | timespec | timespec, kind: CursorKind.STRUCT_DECL, is_definition: False, location: <SourceLocati on file '/usr/include/time.h', line 120, column 8>

118 /* POSIX.1b structure for a time value.  This is like a `struct timeval' but
119    has nanoseconds instead of microseconds.  */
120 struct timespec
121   {
122     __time_t tv_sec;    /* Seconds.  */
123     __syscall_slong_t tv_nsec;  /* Nanoseconds.  */
124   };

RHEL6, x86_64

[2014-05-15 21:58:46,767] DEBUG: Child: <clang.cindex.Cursor object at 0x2468560> | timespec | timespec, kind: CursorKind.STRUCT_DECL, is_definition: True, location: <SourceLocation file '/usr/include/time.h', line 120, column 8> [2014-05-15 21:58:46,767] DEBUG: Found struct/class/template definition: timespec [2014-05-15 21:58:46,767] DEBUG: Skipping since it is an external of this package: timespec

118 / POSIX.1b structure for a time value. This is like a `struct timeval' but 119 has nanoseconds instead of microseconds. / 120 struct timespec 121 { 122 __time_t tv_sec; / Seconds. / 123 long int tv_nsec; / Nanoseconds. / 124 }; 125

But with clang -Xclang -ast-dump -fsyntax-only

|-RecordDecl 0x85fb030 </usr/include/time.h:120:1, line:124:3> struct timespec definition | |-FieldDecl 0x85fb100 <line:122:5, col:14> tv_sec 'time_t':'long' | `-FieldDecl 0x85fb180 <line:123:5, col:23> tv_nsec 'syscall_slong_t':'long'

|-RecordDecl 0x39e1e70 </usr/include/time.h:120:1, line:124:3> struct timespec definition | |-FieldDecl 0x39e1f40 <line:122:5, col:14> tv_sec '__time_t':'long' | `-FieldDecl 0x39e1fa0 <line:123:5, col:14> tv_nsec 'long'

On both machines it says "struct timespec definition".

Something like that is good enough to reproduce:

$ cat check.py import sys import clang.cindex

def find_typerefs(node): for child in node.get_children(): print("{0} {1} {2} {3}".format(child.displayname, child.kind, child.is_definition(), child.location)) find_typerefs(child)

index = clang.cindex.Index.create() tu = index.parse(sys.argv[1]) print 'Translation unit:', tu.spelling find_typerefs(tu.cursor)

$ python check.py /usr/include/time.h

x86_64

timespec CursorKind.STRUCT_DECL True <SourceLocation file '/usr/include/time.h', line 120, column 8> tv_sec CursorKind.FIELD_DECL True <SourceLocation file '/usr/include/time.h', line 122, column 14> __time_t CursorKind.TYPE_REF False <SourceLocation file '/usr/include/time.h', line 122, column 5> tv_nsec CursorKind.FIELD_DECL True <SourceLocation file '/usr/include/time.h', line 123, column 14>

aarch64

timespec CursorKind.STRUCT_DECL False <SourceLocation file '/usr/include/time.h', line 120, column 8> tv_sec CursorKind.FIELD_DECL True <SourceLocation file '/usr/include/time.h', line 122, column 14> time_t CursorKind.TYPE_REF False <SourceLocation file '/usr/include/time.h', line 122, column 5> tv_nsec CursorKind.FIELD_DECL True <SourceLocation file '/usr/include/time.h', line 123, column 23> syscall_slong_t CursorKind.TYPE_REF False <SourceLocation file '/usr/include/time.h', line 123, column 5>

Is there any issues with libclang or/and python binding on aarch64?

llvmbot commented 10 years ago

I moved to trunk for LLVM and Clang. Still the same. Smaller example below.

$ cat my.h struct timespec { int tv_sec; int tv_nsec; };

$ cat check.py import sys import clang.cindex

def find_all(node): for child in node.get_children(): print("displayname: {0}, kind: {1}, is_definition: {2}, location:{3}".format(child.displayname, child.kind, child.is_definition(), child.location)) if child.get_definition() is not None: print(">> get_definition().location: {0}".format(child.get_definition().location)) find_all(child)

index = clang.cindex.Index.create() tu = index.parse(sys.argv[1]) print 'Translation unit:', tu.spelling find_all(tu.cursor)

Fedora 20 / x86_64

$ python check.py my.h Translation unit: my.h displayname: __int128_t, kind: CursorKind.TYPEDEF_DECL, is_definition: True, location:<SourceLocation file None, line 0, column 0>

get_definition().location: <SourceLocation file None, line 0, column 0> displayname: uint128_t, kind: CursorKind.TYPEDEF_DECL, is_definition: True, location:<SourceLocation file None, line 0, column 0> get_definition().location: <SourceLocation file None, line 0, column 0> displayname: builtin_va_list, kind: CursorKind.TYPEDEF_DECL, is_definition: True, location:<SourceLocation file None, line 0, column 0> get_definition().location: <SourceLocation file None, line 0, column 0> displayname: __va_list_tag, kind: CursorKind.TYPE_REF, is_definition: False, location:<SourceLocation file None, line 0, column 0> get_definition().location: <SourceLocation file None, line 0, column 0> displayname: timespec, kind: CursorKind.STRUCT_DECL, is_definition: True, location:<SourceLocation file 'my.h', line 1, column 8> get_definition().location: <SourceLocation file 'my.h', line 1, column 8> displayname: tv_sec, kind: CursorKind.FIELD_DECL, is_definition: True, location:<SourceLocation file 'my.h', line 3, column 7> get_definition().location: <SourceLocation file 'my.h', line 3, column 7> displayname: tv_nsec, kind: CursorKind.FIELD_DECL, is_definition: True, location:<SourceLocation file 'my.h', line 4, column 7> get_definition().location: <SourceLocation file 'my.h', line 4, column 7>

Fedora 19 / AArch64

$ python check.py my.h Translation unit: my.h displayname: timespec, kind: CursorKind.STRUCT_DECL, is_definition: False, location:<SourceLocation file 'my.h', line 1, column 8>

get_definition().location: <SourceLocation file 'my.h', line 1, column 8> displayname: tv_sec, kind: CursorKind.FIELD_DECL, is_definition: True, location:<SourceLocation file 'my.h', line 3, column 7> get_definition().location: <SourceLocation file 'my.h', line 3, column 7> displayname: tv_nsec, kind: CursorKind.FIELD_DECL, is_definition: True, location:<SourceLocation file 'my.h', line 4, column 7> get_definition().location: <SourceLocation file 'my.h', line 4, column 7>

llvmbot commented 10 years ago

Currently looks like the following is failing:

4708 unsigned clang_isCursorDefinition(CXCursor C) { 4709 if (!clang_isDeclaration(C.kind)) 4710 return 0; 4711 4712 return clang_getCursorDefinition(C) == C; 4713 }

Line 4712.

Breakpoint 1, clang_isCursorDefinition (C=...) at /home/david/new-arch/test/BUILD/fc19_aarch64_gcc490/external/llvm/3.4-cms2/llvm-3.4-6800b6d2afc/tools/clang/tools/libclang/CIndex.cpp:4709 4709 if (!clang_isDeclaration(C.kind)) (gdb) p C $1 = {kind = CXCursor_ClassDecl, xdata = 0, data = {0x7fb39e60e0, 0x0, 0x7fb000cfb0}} (gdb) p clang_getCString(clang_getCursorDisplayName(C)) $2 = 0x9a5cd0 "RunNumber" (gdb) p C. data kind xdata (gdb) set $foo = clang_getCursorDefinition(C) (gdb) p $foo $3 = {kind = CXCursor_ClassDecl, xdata = 0, data = {0x7fb39e60e0, 0x1, 0x7fb000cfb0}}

But printing location of both, prints the same file, line and column.

They are not equal because: C.data[1] != clang_getCursorDefinition(C).data[1]