xuhuisheng / rocm-gfx803

185 stars 9 forks source link

Pytorch2.0.1 Rocm5.5 support #31

Open aseok opened 1 year ago

aseok commented 1 year ago

Hi Will you also release this version?

Tokoshie commented 1 year ago

my new build : https://github.com/Tokoshie/pytorch-gfx803/releases/tag/v2.1.0a0

brsh1 commented 10 months ago

should it work for gfx900?

xuhuisheng commented 10 months ago

Hi guys, I just meet a gfx906 PCIe atomic feature problem. AMD said either we used gfx9x GPU, the cpu and motherboard have to support PCIe atomic feature, Or pytorch 2.x will return error results.

Good news is if PCIe atomic feature had been supported, pytoch 2.x can run properly on gfx9.

So gfx900 is just well, you can use official released pytorch-2.x.

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6/
brsh1 commented 10 months ago

now I wonder if I use the gpu as passtrough on my xen server should I expect pcie atomic issues cause of virtualization layer?

xuhuisheng commented 10 months ago

@brsh1 I hadn't used gpu passthrough yet, I can only suggest you to give a try.

brsh1 commented 10 months ago

the reason I am asking is I have this error cant initialize nvml. when trying the version you suggested.

brsh1 commented 10 months ago

just to make sure what rocm version should I be using? 5.6?

xuhuisheng commented 10 months ago

The latest ROCm-5.6 is just fine. I also test ROCm-5.5 with pytorch-2.x, gfx906 always return invalid results, without PCIe atomic.

If you want to workaround PCIe atomic problem, my suggestion is rollback to pytorch-1.13.1. it can run SD properly.

https://download.pytorch.org/whl/rocm5.2/torch-1.13.1%2Brocm5.2-cp310-cp310-linux_x86_64.whl

brsh1 commented 9 months ago

should I be using: HSA_OVERRIDE_GFX_VERSION=10.3.0 for gfx900? torch.cuda.is_available reports true, but mnist example, stable diffusion both fail to run. stuck with 100% cpu until process killed by me. any ideas?

viebrix commented 9 months ago

@xuhuisheng many thanks for your work and description. It helped me a lot to use RX 580.

With gfx803 and rocm 5.6 I got the segmentation error in web-ui, which seems to show that torch ( v2.0.1-rc2) /vision (v0.15.2-rc2) and rocm (5.6) version does not work together. 5.5.0 worked like a charm. which specific pytorch and vision version did you use? see also https://github.com/xuhuisheng/rocm-gfx803/issues/27#issuecomment-1665955664

SLi-Man commented 9 months ago

我的新版本 : https://github.com/Tokoshie/pytorch-gfx803/releases/tag/v2.1.0a0

您好,使用了您构建的PyTorch 2.1.0a0版本,可是运行Stable-Diffusion-webui还需要与torch版本对应的torchaudio,请问应该如何选择适合该PyTorch的torchaudio版本?

我测试了torchaudio-2.1.0会提示不兼容,当我尝试强制修改PyTorch 2.1.0a0的版本号为2.1.0后虽然不再提示兼容问题,但import torchaudio导入包仍会报错。

viebrix commented 5 months ago

@SLi-Man I didn't change the original audio from web-ui. But here is a table with the matching versions: https://github.com/pytorch/pytorch/wiki/PyTorch-Versions

I did also an update for newest pytorch: https://github.com/viebrix/pytorch-gfx803/tree/main