Open heftig opened 3 weeks ago
Thanks for the report. I've come to believe libclang
should be able to figure the standard include paths out for itself.
I tried to debug this on an archlinux:latest
container, and it's a bit odd.
$ hawkmoth --clang=-E --clang=-Wp,-v --domain=c /tmp/empty.c
ignoring nonexistent directory "lib/clang/18/include"
ignoring nonexistent directory "/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/include
End of search list.
$ clang -E -Wp,-v -xc /tmp/empty.c
clang -cc1 version 18.1.8 based upon LLVM 18.1.8 default target x86_64-pc-linux-gnu
ignoring nonexistent directory "/usr/sbin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/clang/18/include
/usr/local/include
/usr/include
End of search list.
$ hawkmoth --clang=-E --clang=-Wp,-v --domain=cpp /tmp/empty.cpp
ignoring nonexistent directory "lib/clang/18/include"
ignoring nonexistent directory "/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1
/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/x86_64-pc-linux-gnu
/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/backward
/usr/local/include
/usr/include
End of search list.
$ clang -E -Wp,-v -xc++ /tmp/empty.cpp
clang -cc1 version 18.1.8 based upon LLVM 18.1.8 default target x86_64-pc-linux-gnu
ignoring nonexistent directory "/usr/sbin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/sbin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1
/usr/sbin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/x86_64-pc-linux-gnu
/usr/sbin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/backward
/usr/lib/clang/18/include
/usr/local/include
/usr/include
End of search list.
Both C and C++ preprocess fail to find the C headers through libclang
, apparently because of ignoring nonexistent directory "lib/clang/18/include"
E + ERROR: bool.c:1: 'stdbool.h' file not found
E + ERROR: c++/14.2.1/cstddef:50: 'stddef.h' file not found
It's just really odd that libclang
would try to use "lib/clang/18/include", and does make me wonder if there's an issue with the clang installation. Or if some package is missing. Currently this works as-is on several versions of Debian, Ubuntu, Fedora and Alpine.
EDIT: Since the files are there under /usr/lib/clang/18/include
, makes me lean towards a bug rather than a missing package.
Thanks. To reproduce without hawkmoth:
> python -c 'import clang.cindex; clang.cindex.Index.create().parse("/dev/null", ["-x", "c", "-E", "-Wp,-v"])'
ignoring nonexistent directory "lib/clang/18/include"
ignoring nonexistent directory "/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/include
End of search list.
> clang -x c -E -Wp,-v /dev/null
clang -cc1 version 18.1.8 based upon LLVM 18.1.8 default target x86_64-pc-linux-gnu
ignoring nonexistent directory "/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/clang/18/include
/usr/local/include
/usr/include
End of search list.
So the python version uses a pwd-relative path when it should be anchored at /usr
(the system prefix) instead.
@foutrelis Any idea?
@heftig Thanks for creating the reproducer without hawkmoth.
I looked at what the other distros have. It's actually not as rosy as I believed it to be:
/usr/include/clang/16.0.6/include
, it exists and worksignoring nonexistent directory "lib/clang/16/include"
/usr/lib/clang/16/include
exists and has all the headersmusl-dev
package provides the includes (I'm guessing at /usr/include
), and it worksclang
directly finds /usr/lib/llvm16/lib/clang/16/include
so the prefix is missing with libclang
/usr/lib/clang/16/
is a symlink to /usr/lib/llvm16/lib/clang/16
../lib/clang/19/include
and it works with my docker containers, but alas maybe by coincidence. The containers have WORKDIR /src
and /lib/clang/19/include
exists and works.cd
somewhere else and boom, ignoring nonexistent directory "../lib/clang/19/include"
clang
directly finds /usr/bin/../lib/clang/19/include
so the prefix is missing with libclang
Debian applies this patch, which looks relevant: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/18/debian/patches/fix-clang-path-and-build.diff
Debian applies this patch, which looks relevant: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/18/debian/patches/fix-clang-path-and-build.diff
I wonder if there's a libclang upstream bug about this. Or if this is about some downstream build configuration?
I blame the arch using reviewer :stuck_out_tongue:
Anyway, if Fedora also has this issue... Maybe it is worthwhile reverting this ourselves for now and avoid the pain for unwitting packagers? What do you think @jnikula? It may also help users understanding how to fix their projects if they have our example to go by.
I wonder if there's a libclang upstream bug about this. Or if this is about some downstream build configuration?
I think you found it already about a month ago :) Linking it here for convenience: https://github.com/llvm/llvm-project/issues/18150.
-->8--
No real surprises here, but I did do a quick check to see that this is indeed libclang's fault and not the bindings'. I also did this before looking for a bug upstream and, following the code around, I found the same difference between clang and libclang in deducing the header path as the ticket author.
#include <clang-c/Index.h>
#include <iostream>
int main(){
CXIndex index = clang_createIndex(0, 0);
const char *args[] = {"-x", "c", "-E", "-Wp,-v"};
CXTranslationUnit unit =
clang_parseTranslationUnit(index, "/dev/null",
args, 4, nullptr, 0,
CXTranslationUnit_None);
if (unit == nullptr) {
std::cerr << "Unable to parse translation unit. Quitting.\n";
return 1;
}
for (unsigned i = 0; i < clang_getNumDiagnostics(unit); ++i) {
auto diag = clang_getDiagnostic(unit, i);
auto string = clang_getDiagnosticSpelling(diag);
std::cout << clang_getCString(string) << std::endl;
}
return 0;
}
$ clang++ -lclang a.cpp && ./a.out
ignoring nonexistent directory "lib/clang/18/include"
ignoring nonexistent directory "/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/include
End of search list.
argument unused during compilation: '-fsyntax-only'
Anyway, if Fedora also has this issue... Maybe it is worthwhile reverting this ourselves for now and avoid the pain for unwitting packagers? What do you think @jnikula? It may also help users understanding how to fix their projects if they have our example to go by.
I think the approach with hawkmoth.util.compiler.get_include_args()
may be misguided. For one, it dates back to the time without C++ support. It needs to be language specific. For another, it doesn't always seem to be that straightforward to match the command-line compiler and libclang. See for example my attempts at getting CI running on github macos images, which have at least three different installations. How is the Python bindings user supposed to figure out the compiler frontend to go with the libclang version the Python bindings are using?
Anyway, if Fedora also has this issue... Maybe it is worthwhile reverting this ourselves for now and avoid the pain for unwitting packagers? What do you think @jnikula? It may also help users understanding how to fix their projects if they have our example to go by.
I think the approach with
hawkmoth.util.compiler.get_include_args()
may be misguided. For one, it dates back to the time without C++ support. It needs to be language specific. For another, it doesn't always seem to be that straightforward to match the command-line compiler and libclang. See for example my attempts at getting CI running on github macos images, which have at least three different installations. How is the Python bindings user supposed to figure out the compiler frontend to go with the libclang version the Python bindings are using?
Maybe I misunderstood something then. Seems like get_include_args(cc_path)
does this already depending on which cc_path
(poorly named, I guess) is given. It probably should be passed clang++
instead of clang
in certain contexts though and possibly a specific /imspecial/macos/
prefix in others.
Kind of related, I've wondered before if a better default would actually be -nostdinc
or -nostdinc <similar to get_include_args(config.hawkmoth_compiler) but with -isystem instead of -I>
... But I digress, assuming clang by default is not wrong.
Maybe I misunderstood something then. Seems like
get_include_args(cc_path)
does this already depending on whichcc_path
(poorly named, I guess) is given. It probably should be passedclang++
instead ofclang
in certain contexts though and possibly a specific/imspecial/macos/
prefix in others.
My point is, we don't really specify which libclang to use. We don't have any discovery of our own for that. The user may specify it through a few different ways, and the Python bindings do the rest. How are we supposed to know which clang
binary to use for figuring out the header path? There could be subtle differences between different clang versions for example.
Maybe we should indeed use something like llvm-config --prefix
and pass that to -isystem
. But even then, which llvm-config
to use?
Oh, I also don't think there's any requirement that the clang
frontend should be packaged with libclang. It may be a different dependency. I think I've certainly seen llvm-config
packaged separately, but I don't remember right now on which distro.
I had to revert 124385241fbb3d3181a0420e56998d37cb328f1e in order to get the test suite to pass again for the Arch Linux package build. Otherwise the tests would fail to find include files like
stddef.h
andstdbool.h
.Is this an issue with our Clang?