sighingnow / libclang

(Unofficial) Release libclang (clang.cindex) on pypi.
https://pypi.org/project/libclang
Other
81 stars 21 forks source link

Including the clang std library #40

Closed JhnW closed 1 year ago

JhnW commented 1 year ago

At the outset, I would like to point out that this is not a bug, but rather an improvement in the quality of life category. You may not want to do this, but then I need information to work around it for myself. The whole thing needs a bit of explanation.

Problem: No clang tool can parse all glibc out-of-box. From what I've noticed the main problem is stddef.h. Glibc does not include this file as a physical dependency, it is an internal to the compiler. However, when trying to parse a library, clang encounters this include in many cases. If we can't solve it, we have a problem. Perhaps some other file/files have a similar surprise, but I don't know about it.

How others deal with it Most users, apart from tools, install the whole clang and either use its standard library for parsing or just add the problematic file to the include path. An interesting case uses Qt Creator. It has the whole std clang in its sources and adds a problematic file from its sources to the paths (clang model). You can see it as we click to view the include sources. Working under gcc and clicking on include vector, string etc. we will be transferred to the preview of our compiler's std sources, but if we click on stddef.h the IDE will show us its internal clang std file it uses :)

What do I propose I came across this problem when I wanted to add full support for "external modules" together with the std library in my project https://github.com/JhnW/devana. I can include std clang in the project for my own needs. However, it seems to me that it is better that clang stds (or at least stddef.h) come to us together with your package. Since the purpose of this package is to facilitate the installation of native dependencies, the std needed to parse the full gcc code is in this definition. Additionally, it's a version management issue. It is best to use the version of clang std lib that is related to the current clang version from which clanglib comes out. So it's nice to have version matching managed by libclang.

If you want to add this dependency, a simple function returning the current dependency path would be appreciated. If not, just let me know, I will maintain such a dependency as part of my project (although as I wrote, from the version maintenance point of view, this is not the best).

sighingnow commented 1 year ago

I have no experience about stdef.h, but I would like to share you that we use -nostdinc -nostdinc++ and inference the fully implicit and explicit include directories when using libclang to parse C++ sources, that seems work on Linux for GCC as well.

See also:

Hope the above information could be helpful.

JhnW commented 1 year ago

Hi. Thanks for your willingness to help. However, I see that in the linked places you approach this problem in the same way as everyone else - replacing part/whole of stdlib. It is not possible to properly compress C++ code using the standard library under Linux without replacing (for parsing purposes) glibc with the clang library, or at least providing the missing compiler internal files (please find QT Creator clang code model example).

Anyway, we still have to decide something. You can just leave it as is. Then anyone using libclang must have std by clang if needed. A certain improvement would be to provide a copy of clang's std (headers) with the parser. The main improvement to this is keeping versions consistent. Otherwise, anyone using libclang and having the need to use the standard library (and parsing its dependencies) would also have to manually check the header version each time a python dependency is raised (although in most cases the old ones should not cause any problems).

aburdulescu commented 1 year ago

FYI, there's a page full of information about this topic on clangd's website: https://clangd.llvm.org/guides/system-headers

In my experience, the query-driver trick always works for clang/gcc like compilers and I do it by default for all projects where I use libclang

JhnW commented 1 year ago

As I know the query is carried out automatically. Using clang "out of the box" we'll get the paths of the gcc standard library (Linux platform). Everything is fine at this stage. However, if we want to perform deep parsing of the translation unit, sooner or later you find problem with built-in headers like stddef.h. Glibc will refer to a header that it does not provide, because only the compiler internally resolves what clang does not do. Tool developers usually deal with this by using libcxx (std for clang) for parsing. This can be easily seen by using, for example, the clang code model in Qt Creator and referring to, for example, stddef.h, we will be transferred to the clang file even when using another compilation tool.

For this reason, I suggested that together with the native library, the python package should contain an access point to the standard clang library in the same version. This is a change from the category of quality of life and improved maintenance of dependent packages.