Non-empty compiler output

stavoltafunzia commented 1 month ago

I always get a non-empty compiler output for any kernel I compile. The complete warning message says:

CompilerWarning: Built kernel retrieved from cache. Original from-source build had warnings:
Build on <pyopencl.Device 'NVIDIA GeForce RTX 4070 SUPER' on 'NVIDIA CUDA' at 0x41e0140> succeeded, but said:

(): Warning: Function simple_mult is a kernel, so overriding noinline attribute. The function may be inlined when called.

To Reproduce Sample code to reproduce:

import pyopencl as cl

src = r"""
void __kernel simple_mult(__global const int *A, __global int *B) 
{
    B[get_global_id(0)] = A[get_global_id(0)] * 3;
}
"""

device = cl.get_platforms()[0].get_devices()[0]
cl_ctx =  cl.Context(devices=[device])
queue = cl.CommandQueue(cl_ctx)
prg = cl.Program(cl_ctx, src).build()

Expected behavior The compiler output should be empty. When I use my OpenCL from C/C++ code, clGetProgramBuildInfo returns empty messages.

Environment (please complete the following information):

OS: Debian Bookworm
ICD Loader and version: libnvidia-opencl from Cuda 12.4
ICD and version: libnvidia-opencl from Cuda 12.4
CPU/GPU: Nvidia RTX 4000 series
- Python version: 3.11
- PyOpenCL version: 2024.2.2

inducer commented 1 month ago

I'm a bit puzzled why this behavior should be different between PyOpenCL and a C++ program calling OpenCL directly. One possible reason that these messages got cached from an old version of the driver. You can check for this by deleting PyOpenCL's build cache:

# Careful! Double check this command before running it, to ensure it does what you intend.
rm -Rf $HOME/.cache/pyopencl

and then rerunning.

matthiasdiener commented 1 month ago

FWIW, I was not able to reproduce this with CUDA 12.2 on Debian unstable building for a TITAN X.

stavoltafunzia commented 1 month ago

I'm a bit puzzled why this behavior should be different between PyOpenCL and a C++ program calling OpenCL directly. One possible reason that these messages got cached from an old version of the driver. You can check for this by deleting PyOpenCL's build cache:
# Careful! Double check this command before running it, to ensure it does what you intend.
rm -Rf $HOME/.cache/pyopencl  
and then rerunning.

Thanks, tried it, but unfortunately didn't work for me.

I also verified that compiled C code and pyopencl are indeed using the same opencl library. With strace I see that both programs open the following library:

openat(AT_FDCWD, "glibc-hwcaps/x86-64-v3/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "glibc-hwcaps/x86-64-v2/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=84758, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 84758, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f35713dd000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = 3

I really have no idea why this message is originated.

stavoltafunzia commented 1 month ago

Finally I found it. After inserting print statements to the pyopencl C++ code, I noticed that the following line was added to the kernel source:

__constant int pyopencl_defeat_cache_14d61c4d6ee748c9a9cef2d50121f8ef = 0;

If I remove such line (modifying pyopencl C++ code) I don't get anymore the compiler warning. Adding such line to my C++ opencl kernel, makes me getting the same build log. So that's the reason in the end, and pyopencl is consistent with C/C++ opencl interface.

Update: this is not the real cause.

inducer commented 1 month ago

Interesting! Thanks for tracking this down, I had forgotten about that. :) I still kind of don't understand why having this triggers the warning it does; the warning seems entirely unrelated to that variable definition?

stavoltafunzia commented 1 month ago

Yea, the compiler message seems totally unrelated to that variable, yet it’s anyway triggered by it. Don’t know what nvidia is doing here; we all know OpenCL is not bvidia top priority (to say an euphemism). Btw, for curiosity, why that constant variable is added to the kernel source code? Looks like it’s related to pyopencl caching system. Is there an easy way to disable it?

inducer commented 1 month ago

The reason the variable is there is to defeat broken vendor caches. I don't remember specifics, but in PyOpenCL's early days, I spent a long time tracking down what ended up being a bug in an ICD compiler cache. The ICD compiler did not notice that a header file included by the source was changed, and insisted on using a (stale) cached binary. That variable definition was there to help "convince" ICDs that they're looking at new source code every time, while PyOpenCL's own caching system is (hopefully) less broken than the ones built into the ICD. That said, for some specific ICDs that (competently) do their own caching, PyOpenCL's caching system imposes unnecessary overhead, which we're now thinking of (selectively) removing. See #738 for some discussion.

stavoltafunzia commented 1 month ago

I now realized I gave a wrong explanation. I've been get confused by (I think) some build caching mechanism that the nvidia ICD compiler is apparently using. From C interface, the ICD compiler builds the kernel, and get non empty build info, only the first time I execute the program, while in all subsequent program executions clBuildProgram (I think that) uses some cache and clGetProgramBuildInfo returns an empty message only because (I think that) the build info are not cached. In conclusion, the line below is not triggering the non-empty build log. I do get a non-empty build log even from the C interface, though only the first time I compile a kernel (I suppose due to the caching mechanism mentioned above).

__constant int pyopencl_defeat_cache_14d61c4d6ee748c9a9cef2d50121f8ef = 0;

In the end, nothing is due to pyopencl.

inducer commented 1 month ago

Glad to hear everything got resolved. I'll go ahead and close this issue, LMK if anything else comes up.

inducer / pyopencl

Non-empty compiler output #756