Closed mcourteaux closed 2 months ago
The missing piece is that you can override halide_error to not abort, but just log, with halide_set_error_handler. The default halide_error doesn't return, but if it does return, the generated code cleans everything up and returns the error code. So you can write calling code like this if you want:
my_gpu_pipeline(...) || my_cpu_pipeline(...)
but it's probably better to check which error code was returned and handle appropriately
If I understand this mechanism well, that would mean that you can still use a multi-target static library, and have the selection at runtime by handled by the built-in mechanism relying on can_use_target_feature()
where you do override it to return false after when trying to do CUDA and we determined during startup that CUDA doesn't work. The trick is in having the startup not crash by overriding also halide_error()
and have my "is CUDA present" code actually successfully determine it isn't.
Thanks a lot Andrew! I put together something that detects it at startup, marks which ones are available and even allows you to specify a preferred one. Works nicely.
@abadams Maybe put a FAQ label on this one. I think it's relevant :)
I use generators for everything, and let's say I have set the target to
host,host-cuda
(for simplicity of this question). The static libraries come out do not have a target-triple suffix in their name, so I was assuming that both versions are in there. Quickly checking that withobjdump -tC
confirms that there are three object files (one for each target-triple and one wrapper).Running my software with
CUDA_VISIBLE_DEVICES="" ./myprogram
does fail:This seems to be orchestrated by the
halide_can_use_target_features()
:https://github.com/halide/Halide/blob/45518acbb386c7e54ba9189b4350e57cf574e5ac/src/runtime/can_use_target.cpp#L31-L67
This function right now clearly does not try to check if there is any GPU present or CUDA driver available at all. I know that this not a bug per se, as it's defined to work like this in the documentation. Returning
true
means "may work":https://github.com/halide/Halide/blob/45518acbb386c7e54ba9189b4350e57cf574e5ac/src/runtime/HalideRuntime.h#L1465-L1486
How does production software deal with this? Is the
halide_can_use_target_features()
function overriden with something that actually checks if there is a GPU available? Even then; ifcuInit()
fails, I don't want to crash, but a meaningful error that I can report back to the user.I was digging through the code to see how it does this, and it immediately crashes when you enter a pipeline and it calls
initialize_kernels()
. So, I was hoping I could query thehalide_device_interface
for the supported compute capability, such that I can get either a number back or one of the error codes, and then handle it from there to override thecan_use_target_feature()
. Like so:However, the
compute_capability()
function also crashes the runtime ifcuInit()
fails. Not very helpful. Digging all the way through, it seems thatreturn cuda_error()
calls out tohalide_error()
which callsabort()
. So this far, I haven't found a way to gently query the Halide runtime to see if the CUDA device is actually available without exploding. Using thehalide_device_interface
seemed like my best bet, as it gave the impression it was going to return error codes when stuff fails, but the deeply-nestedabort()
prevents this from happening.It seems that most of the required
if
s and error code definitions are there, but they aren't really useful now because of theabort()
:https://github.com/halide/Halide/blob/45518acbb386c7e54ba9189b4350e57cf574e5ac/src/runtime/cuda.cpp#L1285-L1288 https://github.com/halide/Halide/blob/45518acbb386c7e54ba9189b4350e57cf574e5ac/src/runtime/cuda.cpp#L176-L178 https://github.com/halide/Halide/blob/45518acbb386c7e54ba9189b4350e57cf574e5ac/src/runtime/cuda.cpp#L330-L333
But the last one, prints to a new
HeapPrinter
marked withErrorPrinterType
, which upon destruction, flushes the error tohalide_error()
which prints and callsabort()
.I think that perhaps the
halide_cuda_compute_capability()
should pass some sort of do-not-explode flag to the context acquiring mechanism.However, any solution should be compatible with all device interfaces (Vulkan, OpenCL, ...) IMO.