iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.83k stars 609 forks source link

[ROCm] Custom gfx1100 kernel sample fails to build (clang-offload-bundler not found) #16899

Open kuhar opened 7 months ago

kuhar commented 7 months ago

Error:

➜ ninja all iree-test-deps && ctest -j32 --label-exclude '^driver=cuda|metal' --output-on-failure 
[0/2] Re-checking globbed directories...
[53/53] Generating kernels_gfx1100.co
FAILED: samples/custom_dispatch/hip/kernels/kernels_gfx1100.co /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co 
cd /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels && /home/jakub/iree/build/relass/llvm-project/bin/clang-18 -x hip --offload-device-only --offload-arch=gfx1100 --rocm-path=/opt/rocm -fuse-cuid=none -O3 /home/jakub/iree/iree/samples/custom_dispatch/hip/kernels/kernels.cu -o /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co
clang-18: error: unable to execute command: Executable "clang-offload-bundler" doesn't exist!
clang-18: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)

My rocm installation is under /opt/rocm, the version is 5.7.1.

benvanik commented 7 months ago

is it a complete install? my windows SDK has it: image

kuhar commented 7 months ago

Yes, I even used the cursed amdgpu-pro installer.

➜ ls /opt/rocm/bin 
amdclang     amdclang-cpp  hipcc      hipcc_cmake_linker_helper  hipconfig.pl               hipdemangleatp      hipfc         hipvars.pm    roc-obj-extract  rocm_agent_enumerator
amdclang++   amdflang      hipcc.bin  hipconfig                  hipconvertinplace-perl.sh  hipexamine-perl.sh  hipify-clang  offload-arch  roc-obj-ls       rocminfo
amdclang-cl  amdlld        hipcc.pl   hipconfig.bin              hipconvertinplace.sh       hipexamine.sh       hipify-perl   roc-obj       rocm-smi 
raikonenfnu commented 7 months ago

I think @sogartar faced something similar? can you try with my build script here https://gist.github.com/raikonenfnu/7d2843107929b161b12e56c057e8735d to see if the issue persist?

kuhar commented 7 months ago

@raikonenfnu can you first confirm where the clang-offload-bundler binary should be? Do you have it under /opt/rocm like Ben or installed system-wide?

kuhar commented 7 months ago

We may need to check for this during the cmake configuration step.

raikonenfnu commented 7 months ago

I only have it on /opt/rocm/llvm/bin/ not system wide. IIRC the clang commands to generate the bitcode should not need clang-offload-bundler at all.

I also do not have clang-offload-bundler on my env and was able to compile.

raikonenfnu commented 7 months ago

Oh wait you are talking about macrokernel not microkernel, so my previous assumption/comments might be correct here. The previous comments were more about microkernel. I need to check a bit more about samples macrokernel.

I think it may be the --rocm-path option? I was able to compile hsaco/co with https://github.com/raikonenfnu/macroHipKernel/blob/main/generate_hsaco.sh#L2-L4

Perhaps missing a nogpulib option?

raikonenfnu commented 7 months ago

@kuhar Was able to repro your issue on my system as well. But if I specify export IREE_ROCM_PATH=/opt/rocm, then my error would be:

(EDIT: Deleted log from using -nogpublib )

(EDIT: this one actually works if we point to where the clang-offload-bundler live which is /opt/rocm/llvm/bin) Seems like if we append rocm llvm path for this it will compile OK:

PATH=$PATH:/opt/rocm/llvm/bin /home/stanley/nod/iree-build-notrace/llvm-project/bin/clang-19 -x hip --offload-device-only --offload-arch=gfx1100 --rocm-path=/opt/rocm -fuse-cuid=none -O3 /home/stanley/nod/iree/samples/custom_dispatch/hip/kernels/kernels.cu -o /home/stanley/nod/iree-build-notrace/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co
kuhar commented 7 months ago

Thanks, with this set export PATH="$PATH:/opt/rocm/llvm/bin" it makes more progress and then errors out with:

➜ ninja all                                                                                                               
[0/2] Re-checking globbed directories...
[57/332] Generating rocm_executable_cache_test.bin from executable_cache_test.mlir
FAILED: runtime/plugins/hal/drivers/rocm/cts/rocm_executable_cache_test.bin /home/jakub/iree/build/relass/runtime/plugins/hal/drivers/rocm/cts/rocm_executable_cache_test.bin 
cd /home/jakub/iree/build/relass/runtime/plugins/hal/drivers/rocm/cts && /home/jakub/iree/build/relass/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --compile-mode=hal-executable --iree-hal-target-backends=rocm --iree-rocm-target-chip=gfx908 /home/jakub/iree/iree/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir -o rocm_executable_cache_test.bin --iree-hal-executable-object-search-path=\"/home/jakub/iree/build/relass\"
/home/jakub/iree/iree/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir:15:1: error: cannot find ROCM bitcode files. Check your installation consistency and in the worst case, set --iree-rocm-bc-dir= to a path on your system.
hal.executable.source public @executable {
^
/home/jakub/iree/iree/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir:15:1: error: failed to serialize executable for target backend rocm
hal.executable.source public @executable {
^
/home/jakub/iree/iree/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir:15:1: error: failed to serialize executables
hal.executable.source public @executable {
^
[58/332] Generating rocm_command_buffer_dispatch_test.bin from command_buffer_dispatch_test.mlir

I set both IREE_ROCM_PATH as the cmake variable and exported it as an env var. What am I missing @raikonenfnu?

Separately from solving this, why do we even build this test data in the all target? I'd assume it should only be a dependency for iree-test-deps, no?

kuhar commented 7 months ago

OK it does work after switching from the rocm installation from the amdgpu-pro installer to https://github.com/nod-ai/TheRock/releases/tag/nightly-staging-20240328.41 , setting -DIREE_ROCM_PATH, and doing a clean bulid.

kuhar commented 7 months ago

The last remaining issue is the following error:

➜  ninja iree-test-deps       
[0/2] Re-checking globbed directories...
[1266/1266] Generating kernels_gfx1100.co
FAILED: samples/custom_dispatch/hip/kernels/kernels_gfx1100.co /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co 
cd /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels && /home/jakub/iree/build/relass/llvm-project/bin/clang-19 -x hip --offload-device-only --offload-arch=gfx1100 --rocm-path=/home/jakub/bin/therock -fuse-cuid=none -O3 /home/jakub/iree/iree/samples/custom_dispatch/hip/kernels/kernels.cu -o /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co
In file included from /home/jakub/iree/iree/samples/custom_dispatch/hip/kernels/kernels.cu:7:
In file included from /home/jakub/bin/therock/include/hip/hip_runtime.h:62:
In file included from /home/jakub/bin/therock/include/hip/amd_detail/amd_hip_runtime.h:432:
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_complex_builtins.h:194:27: error: use of undeclared identifier 'max'; did you mean 'fmax'?
  194 |   double __logbw = _LOGBd(_fmaxd(_ABSd(__c), _ABSd(__d)));
      |                           ^
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_complex_builtins.h:45:16: note: expanded from macro '_fmaxd'
   45 | #define _fmaxd max
      |                ^
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_math_forward_declares.h:73:19: note: 'fmax' declared here
   73 | __DEVICE__ double fmax(double, double);
      |                   ^
In file included from /home/jakub/iree/iree/samples/custom_dispatch/hip/kernels/kernels.cu:7:
In file included from /home/jakub/bin/therock/include/hip/hip_runtime.h:62:
In file included from /home/jakub/bin/therock/include/hip/amd_detail/amd_hip_runtime.h:432:
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_complex_builtins.h:227:26: error: use of undeclared identifier 'max'; did you mean 'fmax'?
  227 |   float __logbw = _LOGBf(_fmaxf(_ABSf(__c), _ABSf(__d)));
      |                          ^
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_complex_builtins.h:46:16: note: expanded from macro '_fmaxf'
   46 | #define _fmaxf max
      |                ^
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_math_forward_declares.h:74:18: note: 'fmax' declared here
   74 | __DEVICE__ float fmax(float, float);
      |                  ^
2 errors generated when compiling for gfx1100.
ninja: build stopped: subcommand failed
kuhar commented 7 months ago

@raikonenfnu @antiagainst should we disable these rocm kernels and make them experimental? They don't seem to work out of the box on a typical linux installation but are included in the main ninja targets all (sic!) and iree-test-deps.

kuhar commented 7 months ago

Ping. This still doesn't build for me. After manually patching the cuda kernel, I'm hitting an issue with another tool missing from path:

➜  ninja all iree-test-deps          
[0/2] Re-checking globbed directories...
[638/2136] Generating kernels_gfx1100.co
FAILED: samples/custom_dispatch/hip/kernels/kernels_gfx1100.co /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co 
cd /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels && /home/jakub/iree/build/relass/llvm-project/bin/clang-19 -x hip --offload-device-only --offload-arch=gfx1100 --rocm-path=/home/jakub/bin/therock -fuse-cuid=none -O3 /home/jakub/iree/iree/samples/custom_dispatch/hip/kernels/kernels.cu -o /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co
/home/jakub/bin/therock/bin/clang-offload-bundler: error: unable to find 'llvm-objcopy' in path
clang-19: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
[641/2136] Building CXX object tracy/CMakeFiles/IREETracyProfiler.dir/__/__/__/third_party/tracy/profiler/src/main.cpp.o
ninja: build stopped: subcommand failed.

Seems like this needs a very specific system-wide installation.

hanhanW commented 5 months ago

I'm hitting the same issue..

I tried -DIREE_ROCM_PATH=/opt/rocm/llvm/bin, but the cmake result says that the hip runtime cannot be found.

-- hip runtime cannot be found in /opt/rocm/llvm/bin.
          Please try setting IREE_ROCM_PATH to rocm directory.
          Ukernels will not be compiled.

I thought it's fine, so I went with export PATH="$PATH:/opt/rocm/llvm/bin". Then it is still complaining about cannot find ROCM bitcode files. Check your installation consistency and in the worst case, set --iree-rocm-bc-dir= to a path on your system..

Then I tried -DIREE_ROCM_PATH=/opt/rocm config. The hip runtime issue is gone from cmake results. Without setting env, the Executable "clang-offload-bundler" doesn't exist! error showed up.

If I go with the env config (i.e., export PATH="$PATH:/opt/rocm/llvm/bin"), it starts complaining error: cannot find ROCM bitcode files again.

@kuhar @raikonenfnu What is the actual cmake flag and env var that you're using?

benvanik commented 5 months ago

the rocm path should be to rocm - like /opt/rocm/ - not the llvm bin dir (that may still not work, but I'm pretty sure trying to specify llvm/bin/ won't work)

hanhanW commented 5 months ago

The cmake flag seems to be off. I explicitly return /opt/rocm in the code, like

https://github.com/iree-org/iree/blob/3803de50d93eac83328005962fe441c2d610bb2e/compiler/plugins/target/ROCM/ROCMTarget.cpp#L112-L114

After doing it, it complains: AMD bitcode module is required by this module but was not found at /opt/rocm/ocml.bc. I found that the file is located at /opt/rocm/lib/llvm/lib/clang/17/lib/amdgcn/bitcode/ocml.bc. So I create a symbolic link (i.e., /opt/rocm/ocml.bc) and point it to where it is. Then it compiles. However, I don't see any e2e tests related torocm. There is no hip and rocm in the log ofctest -R tests/e2e`. Are we able to run e2e tests for rocm backend?

hanhanW commented 5 months ago

It looks like we only test compilation but not execution for rocm backend? @ScottTodd is my understanding correct?

https://github.com/iree-org/iree/blob/0467f48e978b014fb04c3cdd691a230749679f4b/tests/e2e/stablehlo_ops/CMakeLists.txt#L691-L695

ScottTodd commented 5 months ago

The "rocm" driver is experimental and scheduled to be deleted. The "hip" driver is stable and is tested.

hanhanW commented 5 months ago

I see, I can run tests now! Thanks for the pointer!

❯ ctest -R tests/e2e/stablehlo_ops/check_hip
Test project /home/nod/iree/build
      Start 1672: iree/tests/e2e/stablehlo_ops/check_hip_stream_abs.mlir
 1/61 Test #1672: iree/tests/e2e/stablehlo_ops/check_hip_stream_abs.mlir .....................   Passed    0.96 sec
      Start 1673: iree/tests/e2e/stablehlo_ops/check_hip_stream_add.mlir
 2/61 Test #1673: iree/tests/e2e/stablehlo_ops/check_hip_stream_add.mlir .....................   Passed    0.25 sec
      Start 1674: iree/tests/e2e/stablehlo_ops/check_hip_stream_batch_norm_inference.mlir
 3/61 Test #1674: iree/tests/e2e/stablehlo_ops/check_hip_stream_batch_norm_inference.mlir ....   Passed    0.23 sec
      Start 1675: iree/tests/e2e/stablehlo_ops/check_hip_stream_bitcast_convert.mlir
 4/61 Test #1675: iree/tests/e2e/stablehlo_ops/check_hip_stream_bitcast_convert.mlir .........   Passed    0.23 sec
      Start 1676: iree/tests/e2e/stablehlo_ops/check_hip_stream_broadcast.mlir
 5/61 Test #1676: iree/tests/e2e/stablehlo_ops/check_hip_stream_broadcast.mlir ...............   Passed    0.24 sec
      Start 1677: iree/tests/e2e/stablehlo_ops/check_hip_stream_broadcast_add.mlir
 6/61 Test #1677: iree/tests/e2e/stablehlo_ops/check_hip_stream_broadcast_add.mlir ...........   Passed    0.23 sec
      Start 1678: iree/tests/e2e/stablehlo_ops/check_hip_stream_broadcast_in_dim.mlir
 7/61 Test #1678: iree/tests/e2e/stablehlo_ops/check_hip_stream_broadcast_in_dim.mlir ........   Passed    0.25 sec
      Start 1679: iree/tests/e2e/stablehlo_ops/check_hip_stream_clamp.mlir
 8/61 Test #1679: iree/tests/e2e/stablehlo_ops/check_hip_stream_clamp.mlir ...................   Passed    0.28 sec
      Start 1680: iree/tests/e2e/stablehlo_ops/check_hip_stream_compare.mlir
 9/61 Test #1680: iree/tests/e2e/stablehlo_ops/check_hip_stream_compare.mlir .................   Passed    0.56 sec
      Start 1681: iree/tests/e2e/stablehlo_ops/check_hip_stream_complex.mlir
10/61 Test #1681: iree/tests/e2e/stablehlo_ops/check_hip_stream_complex.mlir .................   Passed    0.25 sec
      Start 1682: iree/tests/e2e/stablehlo_ops/check_hip_stream_concatenate.mlir
11/61 Test #1682: iree/tests/e2e/stablehlo_ops/check_hip_stream_concatenate.mlir .............   Passed    0.26 sec
      Start 1683: iree/tests/e2e/stablehlo_ops/check_hip_stream_constant.mlir
12/61 Test #1683: iree/tests/e2e/stablehlo_ops/check_hip_stream_constant.mlir ................   Passed    0.23 sec
      Start 1684: iree/tests/e2e/stablehlo_ops/check_hip_stream_convert.mlir
13/61 Test #1684: iree/tests/e2e/stablehlo_ops/check_hip_stream_convert.mlir .................   Passed    0.34 sec
      Start 1685: iree/tests/e2e/stablehlo_ops/check_hip_stream_convolution.mlir
kuhar commented 5 months ago

@hanhanW how did you fix this?

hanhanW commented 5 months ago

I don't have a clean way. It needs local patch. What I did is:

For IREE side, put return /opt/rocm directly in

https://github.com/iree-org/iree/blob/3803de50d93eac83328005962fe441c2d610bb2e/compiler/plugins/target/ROCM/ROCMTarget.cpp#L112-L114

Then I hit an error about missing ocml.bc. After running locate ocml.bc (or cd /opt/rocm/; find . | grep -i ocml.bc), I found the location of the file. And I just added a symbolic link for ocml.bc. E.g.,

sudo -s
cd /opt/rocm
ln -s ./lib/llvm/lib/clang/17/lib/amdgcn/bitcode/ocml.bc ocml.bc

Then you should be able to compile and run tests, e.g., ctest -R tests/e2e/stablehlo_ops/check_hip.

If the cmake flag is fixed, I will no longer need the local patch I guess.

update: we also need export PATH="$PATH:/opt/rocm/llvm/bin" to make it work.