xuhuisheng / rocm-gfx803

185 stars 9 forks source link

Could we update the Torch package here? #24

Closed redthing1 closed 1 year ago

redthing1 commented 1 year ago

Hello,

Thanks to your incredible work I'm able to run the torch build here with ROCM on an RX580 on Arch.

Your Pytorch is Python 3.8 and Torch 1.11

I was hoping for Python 3.9 and Torch 1.12+

I was hoping to use a newer version of Pytorch, how do we build it? I'm willing to help.

xuhuisheng commented 1 year ago

I am a little busy now.

Here is pytorch build script. You can have a try, please enjoy youself.

https://github.com/xuhuisheng/rocm-build/blob/master/gfx803/README.md#workaround-1

And I would suggest using ROCm customized pytorch. https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.13

redthing1 commented 1 year ago

Thank you very much.

By the way @xuhuisheng I attempted your blender patch on v3.4, and I was able to get it to compile and run gfx803 kernels, but the rendering is all wrong:

Ignore the black box, it is just a redaction. But look how it made all these weird patterns and corruption.

2022-12-20_00-40_a

Here are my patches (very small):

patchset.zip

Note that I did try your Blender 3.3 build for gfx803 ROCM, and it does work very well for me. It is just this Blender 3.4 that I built from source that does not work.

Here are my blender build flags:

  local PYTHON_VER=3.10

  cmake \
    -Bbuild \
    -GNinja \
    -Cbuild_files/cmake/config/blender_release.cmake \
    -DWITH_CYCLES_HIP_BINARIES=ON \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DCMAKE_BUILD_TYPE=Release \
    -DWITH_INSTALL_PORTABLE=OFF \
    -DWITH_PYTHON_INSTALL=OFF \
    -DPYTHON_VERSION=$PYTHON_VER \
    -DPYTHON_LIBPATH=/usr/lib \
    -DPYTHON_LIBRARY=python$PYTHON_VER \
    -DPYTHON_INCLUDE_DIRS=/usr/include/python$PYTHON_VER \
    -DCMAKE_CXX_FLAGS="-I /usr/include/python$PYTHON_VER"
  cmake --build build
xuhuisheng commented 1 year ago

I am not sure if the -amdgpu-early-inline-all is enough for gfx803. In blender-3.3, I have to change noinline to ininline manually.

But it is normally if we got differences result cross ROCm upgrading. The ROCm team didn't run testcases for gfx803, so it always get broken.

redthing1 commented 1 year ago

Well though the Blender 3.3 build you made on the same machine worked fine. So I don't think it's ROCM because the Kernel did not changed much. I think it might just be the blender build as you say.

Where is the noinline? I don't see in here. https://github.com/blender/blender/tree/master/intern/cycles/device/hip

Could you help me find it?

xuhuisheng commented 1 year ago

My modification is very simple.

  1. modify intern/cycles/kernel/device/hip/compat.h, change all of __noinline__ to __inline__
  2. modify intern/cycles/device/hip/util.h, change 9 to 8, then cycles can use gfx803

By the way, the kernel codes throws compliation errors with ROCm-5.2 on my PC. The same codes compiles successfully with ROCm-5.3 on the same PC, and can render properly.

I am afraid there are changes in rocm-llvm, which I cannot location. So I said ROCm-5.4 might make any changes which will break gfx803.

redthing1 commented 1 year ago

@xuhuisheng I did read your instructions. But see.

https://github.com/blender/blender/tree/v3.4.1/intern/cycles/device/hip

There is no compat.h here.

The util patch I did and works fine.

xuhuisheng commented 1 year ago

Sorry for the typo.

Please look kernel directory: https://github.com/blender/blender/blob/v3.4.1/intern/cycles/kernel/device/hip/compat.h