Open vawale opened 11 months ago
This sounds like a change in global destructor order has triggered a bug inside the intel opencl driver. I can't think of anything that we do that could cause it to call a pure virtual function on one of its internal data structures. If we are doing something wrong with the OpenCL API, it's supposed to return an error code, not crash.
Yes, I think you are right. I do not get this issue if I use intel-compute-runtime drivers, but those support only GPU device type. I also do not get this issue with pocl opencl implementation that supports CPU device type.
I will report this bug to the maintainers of https://www.intel.com/content/www/us/en/developer/articles/tool/opencl-drivers.html. Thanks for your help :)
Btw, from stack trace it looks like halide_opencl_device_release
function called after main exits. Why is call to clFinish made after main exits? Is it called for any objects with static storage duration?
Yes, it's releasing any compiled shader programs, the command queue, and the context. Maybe we shouldn't do that.
Issue
Running halide programs with intel opencl driver on systems that have newer glibc versions installed results in following error:
Reproduction steps
I will walk through the reproduction steps using latest archlinux image from docker and this version of intel-opencl-runtime from AUR.
Get the latest archlinux image from dockerhub:
Upgrade system:
pacman --query glibc
clinfo
Download halide libraries and headers:
Download attached files halide_repro.zip
Compile gradient.cpp to a binary. Executing this binary will create a static library that uses OpenCL target for the function
gradient
. The specific function or implementation details of this function don't matter for reproduction. This is necessary only to link the actual program with righthalide_opencl_device_interface
symbols.gradient.cpp
This should create libgradient.a static library, which defines symbol
halide_opencl_device_interface
:basic02.cpp
that allocates some memory usingHalide::Runtime::Buffer::device_malloc
.basic02.cpp
Some debugging notes
Get debug symbols for glibc-2.28-7:
Running the program with gdb shows following stack trace on failure:
Based on stack trace, the issue is caught only by newer glibc versions probably because of improvements made in https://github.com/bminor/glibc/commit/6985865bc3ad5b23147ee73466583dd7fdf65892