Add support for AMDGPU - Githubissues

galeselee commented 1 year ago

Purpose I find Taichi users have a demand for AMDGPU. (related issue https://github.com/taichi-dev/taichi/issues/4586) There is also a discussion of the AMD GPU backend here https://github.com/taichi-dev/taichi/issues/412 Thus, I would like to add an AMDGPU backend to the compiler so that my 6900XT can be utilized.

Solution The purpose is to give Taichi preliminary support for AMDPU via LLVM-14 and ROCm. I started by completing the following steps step by step

the relevant dependencies in cmake
a specific version of Taichi-runtime module(taichi/runtime/llvm/runtime_module) for AMDGPU. During the implementation, I found that I couldn't reuse the CPU's llvm directly like taichi does for CUDA. I needed to do some addrspace conversions, so I temporarily used hipcc to help me deal with these issues. (The final approach taken is to hack address space, so we don't need an extra runtime.cpp)
interaction between Taichi and ROCm library functions. In cuda backend, some driver functions are used. However, It's convenient to replace these with functions provided by ROCm, such as hipMemcpy, hipModuleLaunchKernel.
a codegen skeleton (like taichi/codegen/cuda)
convert llvm IR to amdgpu ISA / hsaco(ref link)
Unit test configuration for amdgpu.

galeselee commented 1 year ago

It seems better to add cmake at the end(after step5)

galeselee commented 1 year ago

Here is task list.

[x] ci configuration(https://github.com/taichi-dev/taichi/pull/6711, https://github.com/taichi-dev/taichi/pull/6736, https://github.com/taichi-dev/taichi/pull/6743)
[x] render hardware interface(https://github.com/taichi-dev/taichi/pull/6464)
[x] codegen(https://github.com/taichi-dev/taichi/pull/6469)
[x] runtime(https://github.com/taichi-dev/taichi/pull/6482)
[x] update runtime module(https://github.com/taichi-dev/taichi/pull/6486)
[x] linker math library((https://github.com/taichi-dev/taichi/pull/7180))
[x] misc(https://github.com/taichi-dev/taichi/pull/7202)
[x] Unit test configuration(https://github.com/taichi-dev/taichi/pull/7293)
[x] Add profiler(https://github.com/taichi-dev/taichi/pull/7330)
[x] Add shared memory support(https://github.com/taichi-dev/taichi/pull/7403)
[x] Optimize saxpy performance(FMA)(https://github.com/taichi-dev/taichi/pull/7398)
[ ] Cache hsaco file
[ ] Support more architectures (currently only RDNA2)

ITCJ commented 1 year ago

Amazing works.

I'm also considering this task since our new HPC is using Sugon gtx106 (known as Vega 20). I am quite interested is there any possible to cooperate ?

Besides, Are you also working on ASC 2022 competition ? I notice that you forked DeePMD-kit which is one of the tasks of ASC2022.

galeselee commented 1 year ago

I'm also considering this task since our new HPC is using Sugon gtx106 (known as Vega 20). I am quite interested is there any possible to cooperate ?

@ITCJ Certainly! Now, I have implemented a prototype for AMDGPU backend based on a RDNA2-arch card(don't worry, there are lots of work need to do). If you are interested, here is the code By the way, I don't know sth about Sugon gtx106 :-). In my knowledge, gtx is a serial production from nvidia. Do you mean DCU?

ITCJ commented 1 year ago

I'm also considering this task since our new HPC is using Sugon gtx106 (known as Vega 20). I am quite interested is there any possible to cooperate ?

@ITCJ Certainly! Now, I have implemented a prototype for AMDGPU backend based on a RDNA2-arch card(don't worry, there are lots of work need to do). If you are interested, here is the code By the way, I don't know sth about Sugon gtx106 :-). In my knowledge, gtx is a serial production from nvidia. Do you mean DCU?

@GaleSeLee yeah, it's Surgon DCU. I recheck the rocminfo and found it is gfx906. I am confusing for this duplication for a time but never check rocminfo again, ORZ.

  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx906:sramecc-:xnack-

AND, Would you mind if I friend you on Wechat？I have send you my wechat id AND QR code to your edu email.

liuyunpu commented 11 months ago

Is there any plan to support cdna or cdna2 architecture? I hope to run my taichi code on MI series gpu.

galeselee commented 11 months ago

Is there any plan to support cdna or cdna2 architecture? I hope to run my taichi code on MI series gpu.

Actually, you can run your taichi code on GPU with cdna or cdna2 architecture with a little modification. (I have test some cases on MI210)

liuyunpu commented 11 months ago

How? I thought the current taichi doesn't support amd gpu

galeselee commented 11 months ago

How? I thought the current taichi doesn't support amd gpu

It looks like the amdgpu backend hasn't been released yet, you can either do the compilation yourself (I can sort out how to compile it if you need help) or try using the vulkan backend.

galeselee commented 10 months ago

How? I thought the current taichi doesn't support amd gpu

The current Taichi-AMDGPU backend is based on ROCm, so the first thing you need to do is to install ROCm. (The official doc may be helpful.)
LLVM for AMDGPU backend 2.1. Download llvm source code 2.2. Compile

cmake .. -DLLVM_ENABLE_RTTI:BOOL=ON -DBUILD_SHARED_LIBS:BOOL=OFF - DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" - DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_TERMINFO=OFF - DLLVM_INCLUDE_BENCHMARKS=OFF -DCMAKE_INSTALL_PREFIX=llvm_binary_path make -j `nproc` make install 2.3. Check, there will be four folders(bin, include, lib and share) in the CMAKE_INSATLL_PREFIX.
Download taichi source code git clone --recursive https://github.com/taichi-dev/taichi.git
Prepare env to compile Taichi virtualenv -p python3 your_env_path pip install -r requirements_dev.txt
Compile TAICHI_CMAKE_ARGS="-DTI_WITH_VULKAN=OFF -DTI_WITH_OPENGL=OFF - DTI_BUILD_TESTS=ON -DTI_BUILD_EXAMPLES=OFF - DCMAKE_PREFIX_PATH=llvm_binary_path/lib/cmake -DCMAKE_CXX_COMPILER=clang++ - DTI_WITH_AMDGPU=ON" python setup.py develop Please choose the backend that needs to be compiled according to your needs.(e.g. if you need VULKAN backend, thus -DTI_WITH_VULKAN=ON)
Test pip install -r requirements_test.txt cd tests python run_test.py -a amdgpu -k mpm88 If your output is similar to the one below, then you have finished the compilation.

Please enjoy!

expenses commented 10 months ago

I've managed to build Taichi for AMDGPU, but one problem I ran into is that recent (5.3+) versions of ROCm require PCIe atomics, which are not available on my setup. The most recent version, 5.7 includes a workaround for setups without PCIe atomics: https://rocm.docs.amd.com/en/latest/release.html#non-hostcall-hip-printf.

I had a go at building taichi against LLVM 17, which is the version of LLVM that ROCm 5.7 works with. I haven't been able to test said changes because they require the ROCM 5.7 runtime to be installed and this isn't out in the Arch Linux repositories yet. Here are the changes I've had to make: https://github.com/expenses/taichi/commit/800c6936922360b914bf48e5b749f6f2eeae9834.

I'm not sure if this information is super useful, but it might help some people.

GZGavinZhao commented 2 months ago

I'd just like to note that the AMDGPU backend (TI_ARCH=amdgpu) is already working with ROCm 6.0 on my gfx1032 GPU. It passes all (Python) tests and in many cases is more stable / has less crashes than the Vulkan and OpenGL (albeit at the cost of a slightly slower startup).

Currently the latest LLVM version Taichi officially supports is LLVM 15. I have LLVM 16 working locally, but LLVM 17+ needs some thought because the need (or not?) to migrate to LLVM's new pass manager infrastructure (e.g. PassManagerBuilder is entirely non-existent).

galeselee commented 2 months ago

@GZGavinZhao Thanks for your sharing. And I'm very glad to know this. I have previous experience migrating from a lower version of LLVM to 14. Some of LLVM code need changed according to the LLVM docs due to the change of the LLVM API, like PassManagerBuilder you mentioned.

taichi-dev / taichi

Add support for AMDGPU #6434