halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.91k stars 1.07k forks source link

Fallback to CPU pipelines when there is no GPU doesn't work. #8402

Closed mcourteaux closed 2 months ago

mcourteaux commented 2 months ago

I use generators for everything, and let's say I have set the target to host,host-cuda (for simplicity of this question). The static libraries come out do not have a target-triple suffix in their name, so I was assuming that both versions are in there. Quickly checking that with objdump -tC confirms that there are three object files (one for each target-triple and one wrapper).

Running my software with CUDA_VISIBLE_DEVICES="" ./myprogram does fail:

Error: CUDA error: CUDA_ERROR_NO_DEVICE cuInit failed zsh: IOT instruction (core dumped) CUDA_VISIBLE_DEVICES="" ./neonraw/test/test_neonraw

This seems to be orchestrated by the halide_can_use_target_features():

https://github.com/halide/Halide/blob/45518acbb386c7e54ba9189b4350e57cf574e5ac/src/runtime/can_use_target.cpp#L31-L67

This function right now clearly does not try to check if there is any GPU present or CUDA driver available at all. I know that this not a bug per se, as it's defined to work like this in the documentation. Returning true means "may work":

https://github.com/halide/Halide/blob/45518acbb386c7e54ba9189b4350e57cf574e5ac/src/runtime/HalideRuntime.h#L1465-L1486

How does production software deal with this? Is the halide_can_use_target_features() function overriden with something that actually checks if there is a GPU available? Even then; if cuInit() fails, I don't want to crash, but a meaningful error that I can report back to the user.

I was digging through the code to see how it does this, and it immediately crashes when you enter a pipeline and it calls initialize_kernels(). So, I was hoping I could query the halide_device_interface for the supported compute capability, such that I can get either a number back or one of the error codes, and then handle it from there to override the can_use_target_feature(). Like so:

const halide_device_interface_t *itf = halide_cuda_device_interface();
int major, minor;
int result = itf->compute_capability(nullptr, &major, &minor);
printf("GPU Compute cabability: %d.%d (result_code: %d)\n", major, minor, result);

However, the compute_capability() function also crashes the runtime if cuInit() fails. Not very helpful. Digging all the way through, it seems that return cuda_error() calls out to halide_error() which calls abort(). So this far, I haven't found a way to gently query the Halide runtime to see if the CUDA device is actually available without exploding. Using the halide_device_interface seemed like my best bet, as it gave the impression it was going to return error codes when stuff fails, but the deeply-nested abort() prevents this from happening.

It seems that most of the required ifs and error code definitions are there, but they aren't really useful now because of the abort():

https://github.com/halide/Halide/blob/45518acbb386c7e54ba9189b4350e57cf574e5ac/src/runtime/cuda.cpp#L1285-L1288 https://github.com/halide/Halide/blob/45518acbb386c7e54ba9189b4350e57cf574e5ac/src/runtime/cuda.cpp#L176-L178 https://github.com/halide/Halide/blob/45518acbb386c7e54ba9189b4350e57cf574e5ac/src/runtime/cuda.cpp#L330-L333

But the last one, prints to a new HeapPrinter marked with ErrorPrinterType, which upon destruction, flushes the error to halide_error() which prints and calls abort().

I think that perhaps the halide_cuda_compute_capability() should pass some sort of do-not-explode flag to the context acquiring mechanism.

However, any solution should be compatible with all device interfaces (Vulkan, OpenCL, ...) IMO.

abadams commented 2 months ago

The missing piece is that you can override halide_error to not abort, but just log, with halide_set_error_handler. The default halide_error doesn't return, but if it does return, the generated code cleans everything up and returns the error code. So you can write calling code like this if you want:

my_gpu_pipeline(...) || my_cpu_pipeline(...)

but it's probably better to check which error code was returned and handle appropriately

mcourteaux commented 2 months ago

If I understand this mechanism well, that would mean that you can still use a multi-target static library, and have the selection at runtime by handled by the built-in mechanism relying on can_use_target_feature() where you do override it to return false after when trying to do CUDA and we determined during startup that CUDA doesn't work. The trick is in having the startup not crash by overriding also halide_error() and have my "is CUDA present" code actually successfully determine it isn't.

mcourteaux commented 2 months ago

Thanks a lot Andrew! I put together something that detects it at startup, marks which ones are available and even allows you to specify a preferred one. Works nicely.

mcourteaux commented 2 months ago

@abadams Maybe put a FAQ label on this one. I think it's relevant :)