halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.86k stars 1.07k forks source link

`correctness_gpu_allocation_cache` test seems to be problematic under RustiCL/LLVMPipe #8335

Open LebedevRI opened 3 months ago

LebedevRI commented 3 months ago

Running that test under HL_TARGET=host-opencl HL_JIT_TARGET=host-opencl OCL_ICD_VENDORS=rusticl.icd RUSTICL_ENABLE=llvmpipe, (using Mesa 24.1.2-1 as present in debian sid) so that the OpenCL is actually run on the CPU, that test seems to take a long time, over 5 minutes. I'm not sure if it ever finishes or not at all. I'm not sure if this is a RustiCL bug, or the test is really just fundamentally ill-suited for CPU.

LebedevRI commented 3 months ago

Ok, if looping 30 times instead of 300, it finishes in 525 seconds.

LebedevRI commented 3 months ago

On an actual GPU it seems to finish rather quickly:

CC=clang-17 CXX=clang++-17 cmake -DCMAKE_BUILD_TYPE=Release -DHalide_REQUIRE_LLVM_VERSION=17 -DLLVM_DIR=/usr/lib/llvm-17/lib/cmake/llvm -DTARGET_WEBASSEMBLY=OFF  -DHalide_TARGET="host-opencl" ..
<...>
time HL_TARGET=host-opencl HL_JIT_TARGET=host-opencl OCL_ICD_VENDORS=rusticl.icd RUSTICL_ENABLE=radeonsi test/correctness/correctness_gpu_allocation_cache
Runtime with cache: 1.148489
Without cache: 1.165373
Success!

real    0m11.515s
user    0m9.156s
sys     0m0.183s

I'll forward to mesa.

Hm, but running it outside of deb build, manually, single test, it's not that slow either:

$ time HL_TARGET=host-opencl HL_JIT_TARGET=host-opencl OCL_ICD_VENDORS=rusticl.icd RUSTICL_ENABLE=llvmpipe test/correctness/correctness_gpu_allocation_cache
Runtime with cache: 7.260440
Without cache: 7.325961
Success!

real    1m1.768s
user    1m39.114s
sys     1m53.858s
$ time HL_TARGET=host-opencl HL_JIT_TARGET=host-opencl OCL_ICD_VENDORS=rusticl.icd RUSTICL_ENABLE=llvmpipe test/correctness/correctness_gpu_allocation_cache
Runtime with cache: 7.204123
Without cache: 7.236957
Success!

real    1m0.319s
user    1m39.195s
sys     1m44.556s