Fix sccache for CTK 11.1 and properly track compilations in stats

trxcllnt commented 1 week ago

This PR has some fixes I neglected to add to https://github.com/mozilla/sccache/pull/2247.

nvcc in CUDA toolkit v11.1 didn't add the -D__CUDA_ARCH_LIST__= definition, so https://github.com/mozilla/sccache/commit/5271494812b237812e90aa89bcb9a9c14f9d91cf expands the list of defines that indicate an nvcc host compiler invocation.
https://github.com/mozilla/sccache/commit/f160a0a618c9f48899489960e4cff432d22f43f9 and https://github.com/mozilla/sccache/commit/15910897454628718112f3ce935a25f22e0a7f0b report compilation type (local or dist) and duration for forced-no-cache, forced-recache, and compilation failures. It also counts and reports total compilations performed, not just compilations due to cache misses.
https://github.com/mozilla/sccache/commit/ccfc60b5ff183caa87c10921c8080f5182c56021 ensures compilations with --verbose are never dist-compiled, since the verbose output is parsed by tools like CMake and must reflect the local toolchain.
https://github.com/mozilla/sccache/commit/bdaf35eda55c4092743a2c15facb455ea5efe0d1 adds more clang flags so using clang as a CUDA compiler with -Xclang doesn't fail

Question for @sylvestre related to the last point -- do you know which bits of the clang toolchain (or CTK?) sccache should package when using clang as a device compiler? I am seeing errors like the following when attempting to dist-compile with ClangCUDA, but I'm not sure which files define the __nvvm_* symbols:

In file included from build/libcudacxx/test/internal_headers/headers/__barrier_async_contract_fulfillment.h.cu:1:
In file included from <built-in>:1:
In file included from /usr/lib/llvm-18/lib/clang/18/include/__clang_cuda_runtime_wrapper.h:73:
/usr/lib/llvm-18/lib/clang/18/include/__clang_cuda_builtin_vars.h:53:180: error: use of undeclared identifier '__nvvm_read_ptx_sreg_tid_x'
   53 |   __declspec(property(get = __fetch_builtin_x)) unsigned int x; static inline __attribute__((always_inline)) __attribute__((device)) unsigned int __fetch_builtin_x(void) { return __nvvm_read_ptx_sreg_tid_x(); };
...

sylvestre commented 3 days ago

sorry, i don't know

trxcllnt commented 1 day ago

This doesn't seem to be an issue with sccache. It appears clang can't compile its own preprocessor output:

#!/usr/bin/env bash

# Basic CUDA example from https://godbolt.org/
cat <<EOF >/tmp/test.cu
__global__ void square(int* array, int n) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    if (tid < n)
        array[tid] = array[tid] * array[tid];
}
EOF

# Preprocess
clang++ -x cuda -E --cuda-gpu-arch=sm_80 --cuda-path=/usr/local/cuda -Wno-unknown-cuda-version /tmp/test.cu > /tmp/test.cui

# Compile (fails)
clang++ -x cuda-cpp-output --cuda-gpu-arch=sm_80 --cuda-path=/usr/local/cuda -Wno-unknown-cuda-version -o /tmp/test.cu.o /tmp/test.cui

mozilla / sccache

Fix sccache for CTK 11.1 and properly track compilations in stats #2285