xuhuisheng / rocm-gfx803

185 stars 9 forks source link

Does this work for RX 550 and on arch linux ? #20

Open brentleywilson opened 1 year ago

xuhuisheng commented 1 year ago

The rx550 should be a gfx803 card, so you can have a try. The arch should have supported gfx803 AMDGPU_TARGETS, you can have a try, too. https://github.com/rocm-arch/rocm-arch

brentleywilson commented 1 year ago

The rx550 should be a gfx803 card, so you can have a try. The arch should have supported gfx803 AMDGPU_TARGETS, you can have a try, too. https://github.com/rocm-arch/rocm-arch

Hi, so i have tried rocm-arch through arch4edu and it installed fine and all but when i run /opt/rocm/bin/rocminfo it does not show the GPU but only the CPU, I am trying to run Stable Diffusion and starting the webui gives the cuda no gpu error. How do i check whether rocm is working properly on my arch and is detecting my gpu ? I'm sorry i have a lot of questions but i have been trying this for a while haha.

xuhuisheng commented 1 year ago

@brentleywilson First of all, I tried stable diffusion on my RX580, but unfortunately RX580 throw an error like sync timeout, and won't get propperly results. The error is raised by kernel driver, I don't think I can solve this problem.

And if you used ROCm-5.3 on gfx803, please install https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm530/hsa-rocr_1.7.0.50300-63.20.04_amd64.deb, this is my patched version which skip enable debugging error, then rocminfo will get correct info on gfx803.

Or we don't need install amdgpu-dkms seperately, the version of upstream kernel is old enough and didn't have this killing gfx803 problem.

brentleywilson commented 1 year ago

@xuhuisheng PCI atomics is needed for RX 550, but when i run dmesg i get "PCI rejects atomics 700<0"

xuhuisheng commented 1 year ago

If you get "kfd kfd: skipped device 1002:67df, PCI rejects atomics" from dmesg.

It is said your motherboard or CPU cannot support PCIe Atomic, so gfx803 card cannot run ROCm on this motherboard or CPU. The requirement of PCIe Atomics is writen in the firmware of amdgpu, we cannot workaround.

Next step, if you want test ROCm. you need change motherboard, CPU, or change gpu. As gfx803 cannot support stable diffussion properly. My suggestion is change card to gfx9 at least.

preet commented 1 year ago

@brentleywilson First of all, I tried stable diffusion on my RX580, but unfortunately RX580 throw an error like sync timeout, and won't get propperly results. The error is raised by kernel driver, I don't think I can solve this problem.

I thought I would add my own results. I was able to get SD to mostly work on my RX580 following the summary steps posted by tmpuserx in https://github.com/xuhuisheng/rocm-gfx803/issues/19 Things I noticed

Its not a great experience.

xuhuisheng commented 1 year ago

@preet The opencl has an already known issue, that we need add an environment variable to re-enable OpenCL on gfx803.

ROC_ENABLE_PRE_VEGA=1 /opt/rocm/opencl/bin/clinfo
sithil94 commented 1 year ago

@brentleywilson First of all, I tried stable diffusion on my RX580, but unfortunately RX580 throw an error like sync timeout, and won't get propperly results. The error is raised by kernel driver, I don't think I can solve this problem.

I thought I would add my own results. I was able to get SD to mostly work on my RX580 following the summary steps posted by tmpuserx in #19 Things I noticed

  • clinfo doesn't see/recognize any devices
  • When I kill a process that is using/querying gpu compute my screen gets some lines/artifacts very briefly. This happens for everything not just SD. For example if I run the pytorch mnist example or if I just tell tensorflow to list available devices. For whatever reason, I didn't notice this before but I reinstalled everything and this time I see the artifacts.
  • Occasionally the output of SD images will start showing some artifacts (in the image, not on the screen). For me just restarting the process gets rid of this issue

Its not a great experience.

How did you get SD to run? I get "Unable to find code object for all current devices" Im on an RX480