jnikula / hawkmoth

Hawkmoth - Sphinx Autodoc for C
https://jnikula.github.io/hawkmoth/
BSD 2-Clause "Simplified" License
74 stars 12 forks source link

Clang include args still needed on Arch #262

Open heftig opened 3 weeks ago

heftig commented 3 weeks ago

I had to revert 124385241fbb3d3181a0420e56998d37cb328f1e in order to get the test suite to pass again for the Arch Linux package build. Otherwise the tests would fail to find include files like stddef.h and stdbool.h.

Is this an issue with our Clang?

jnikula commented 3 weeks ago

Thanks for the report. I've come to believe libclang should be able to figure the standard include paths out for itself.

I tried to debug this on an archlinux:latest container, and it's a bit odd.

libclang C header search path (via hawkmoth)

$ hawkmoth --clang=-E --clang=-Wp,-v --domain=c /tmp/empty.c
ignoring nonexistent directory "lib/clang/18/include"
ignoring nonexistent directory "/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/include
End of search list.

clang C header search path

$ clang -E -Wp,-v -xc /tmp/empty.c 
clang -cc1 version 18.1.8 based upon LLVM 18.1.8 default target x86_64-pc-linux-gnu
ignoring nonexistent directory "/usr/sbin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/clang/18/include
 /usr/local/include
 /usr/include
End of search list.

libclang C++ header search path (via hawkmoth)

$ hawkmoth --clang=-E --clang=-Wp,-v --domain=cpp /tmp/empty.cpp
ignoring nonexistent directory "lib/clang/18/include"
ignoring nonexistent directory "/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1
 /../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/x86_64-pc-linux-gnu
 /../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/backward
 /usr/local/include
 /usr/include
End of search list.

clang C++ header search path

$ clang -E -Wp,-v -xc++ /tmp/empty.cpp
clang -cc1 version 18.1.8 based upon LLVM 18.1.8 default target x86_64-pc-linux-gnu
ignoring nonexistent directory "/usr/sbin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/sbin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1
 /usr/sbin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/x86_64-pc-linux-gnu
 /usr/sbin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/backward
 /usr/lib/clang/18/include
 /usr/local/include
 /usr/include
End of search list.

Errors

Both C and C++ preprocess fail to find the C headers through libclang, apparently because of ignoring nonexistent directory "lib/clang/18/include"

E         + ERROR: bool.c:1: 'stdbool.h' file not found

E         + ERROR: c++/14.2.1/cstddef:50: 'stddef.h' file not found

It's just really odd that libclang would try to use "lib/clang/18/include", and does make me wonder if there's an issue with the clang installation. Or if some package is missing. Currently this works as-is on several versions of Debian, Ubuntu, Fedora and Alpine.

EDIT: Since the files are there under /usr/lib/clang/18/include, makes me lean towards a bug rather than a missing package.

heftig commented 3 weeks ago

Thanks. To reproduce without hawkmoth:

> python -c 'import clang.cindex; clang.cindex.Index.create().parse("/dev/null", ["-x", "c", "-E", "-Wp,-v"])'
ignoring nonexistent directory "lib/clang/18/include"
ignoring nonexistent directory "/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/include
End of search list.
> clang -x c -E -Wp,-v /dev/null
clang -cc1 version 18.1.8 based upon LLVM 18.1.8 default target x86_64-pc-linux-gnu
ignoring nonexistent directory "/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/clang/18/include
 /usr/local/include
 /usr/include
End of search list.

So the python version uses a pwd-relative path when it should be anchored at /usr (the system prefix) instead.

@foutrelis Any idea?

jnikula commented 3 weeks ago

@heftig Thanks for creating the reproducer without hawkmoth.

I looked at what the other distros have. It's actually not as rosy as I believed it to be:

Debian/Ubuntu

Alpine

Fedora

heftig commented 3 weeks ago

Debian applies this patch, which looks relevant: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/18/debian/patches/fix-clang-path-and-build.diff

jnikula commented 3 weeks ago

Debian applies this patch, which looks relevant: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/18/debian/patches/fix-clang-path-and-build.diff

I wonder if there's a libclang upstream bug about this. Or if this is about some downstream build configuration?

BrunoMSantos commented 2 weeks ago

I blame the arch using reviewer :stuck_out_tongue:

Anyway, if Fedora also has this issue... Maybe it is worthwhile reverting this ourselves for now and avoid the pain for unwitting packagers? What do you think @jnikula? It may also help users understanding how to fix their projects if they have our example to go by.

I wonder if there's a libclang upstream bug about this. Or if this is about some downstream build configuration?

I think you found it already about a month ago :) Linking it here for convenience: https://github.com/llvm/llvm-project/issues/18150.

-->8--

No real surprises here, but I did do a quick check to see that this is indeed libclang's fault and not the bindings'. I also did this before looking for a bug upstream and, following the code around, I found the same difference between clang and libclang in deducing the header path as the ticket author.

#include <clang-c/Index.h>
#include <iostream>

int main(){
    CXIndex index = clang_createIndex(0, 0);

    const char *args[] = {"-x", "c", "-E", "-Wp,-v"};

    CXTranslationUnit unit =
        clang_parseTranslationUnit(index, "/dev/null",
                       args, 4, nullptr, 0,
                       CXTranslationUnit_None);

    if (unit == nullptr) {
        std::cerr << "Unable to parse translation unit. Quitting.\n";
        return 1;
    }

    for (unsigned i = 0; i < clang_getNumDiagnostics(unit); ++i) {
        auto diag = clang_getDiagnostic(unit, i);
        auto string = clang_getDiagnosticSpelling(diag);
        std::cout << clang_getCString(string) << std::endl;
    }

    return 0;
}
$ clang++ -lclang a.cpp && ./a.out 
ignoring nonexistent directory "lib/clang/18/include"
ignoring nonexistent directory "/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/include
End of search list.
argument unused during compilation: '-fsyntax-only'
jnikula commented 2 weeks ago

Anyway, if Fedora also has this issue... Maybe it is worthwhile reverting this ourselves for now and avoid the pain for unwitting packagers? What do you think @jnikula? It may also help users understanding how to fix their projects if they have our example to go by.

I think the approach with hawkmoth.util.compiler.get_include_args() may be misguided. For one, it dates back to the time without C++ support. It needs to be language specific. For another, it doesn't always seem to be that straightforward to match the command-line compiler and libclang. See for example my attempts at getting CI running on github macos images, which have at least three different installations. How is the Python bindings user supposed to figure out the compiler frontend to go with the libclang version the Python bindings are using?

BrunoMSantos commented 2 weeks ago

Anyway, if Fedora also has this issue... Maybe it is worthwhile reverting this ourselves for now and avoid the pain for unwitting packagers? What do you think @jnikula? It may also help users understanding how to fix their projects if they have our example to go by.

I think the approach with hawkmoth.util.compiler.get_include_args() may be misguided. For one, it dates back to the time without C++ support. It needs to be language specific. For another, it doesn't always seem to be that straightforward to match the command-line compiler and libclang. See for example my attempts at getting CI running on github macos images, which have at least three different installations. How is the Python bindings user supposed to figure out the compiler frontend to go with the libclang version the Python bindings are using?

Maybe I misunderstood something then. Seems like get_include_args(cc_path) does this already depending on which cc_path (poorly named, I guess) is given. It probably should be passed clang++ instead of clang in certain contexts though and possibly a specific /imspecial/macos/ prefix in others.

Kind of related, I've wondered before if a better default would actually be -nostdinc or -nostdinc <similar to get_include_args(config.hawkmoth_compiler) but with -isystem instead of -I>... But I digress, assuming clang by default is not wrong.

jnikula commented 2 weeks ago

Maybe I misunderstood something then. Seems like get_include_args(cc_path) does this already depending on which cc_path (poorly named, I guess) is given. It probably should be passed clang++ instead of clang in certain contexts though and possibly a specific /imspecial/macos/ prefix in others.

My point is, we don't really specify which libclang to use. We don't have any discovery of our own for that. The user may specify it through a few different ways, and the Python bindings do the rest. How are we supposed to know which clang binary to use for figuring out the header path? There could be subtle differences between different clang versions for example.

Maybe we should indeed use something like llvm-config --prefix and pass that to -isystem. But even then, which llvm-config to use?

jnikula commented 2 weeks ago

Oh, I also don't think there's any requirement that the clang frontend should be packaged with libclang. It may be a different dependency. I think I've certainly seen llvm-config packaged separately, but I don't remember right now on which distro.