Same NaN error / not a solution. RX 6800

hydrian / stable-diffusion-webui-rocm

A stable diffusion webui configuration for AMD ROCm

GNU General Public License v2.0

21 stars 4 forks source link

Same NaN error / not a solution. RX 6800 #7

Open InkyZima opened 1 year ago

InkyZima commented 1 year ago

Context: I am here from https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5468. @hydrian i tried this repo / docker; it does not work for me. AMD RX 6800. clean Lubuntu host (5.19 kernel). I also tried --precision full, --no-half, "Upcast cross attention layer to float32". --disable-nan-check just produces black images.

hydrian commented 1 year ago

Last i knew, 5.19 kernel was not supported by ROCm. Try downgrading 5.17 kernel.

hydrian commented 1 year ago

Actually, installling rocm 5.5 just released and supports kernel 5.19. You can try updating that on the host system.

InkyZima commented 1 year ago

very interesting! thanks a lot for the info. Will try ASAP, this weekend latest, and let you know. fingers crossed (:

btw, you are missing a ' at the end of the line in the readme in "Run on the command docker build . -t 'stable-diffusion-webui-rocm"

InkyZima commented 1 year ago

tried; didn't work with kernel 5.17.15 and rocm 5.4.2. it keeps producing NaNs / black images only. Regarding rocm 5.5: i don't know how to get that to work; i can install rocm 5.5 from amd on my host, but there is no torch rocm5.5.

hydrian commented 1 year ago

How are installing rocm on the host system?

I'm using the deb Installation and the rocm packages are very picky. You can't just use mainline/urkuu and install a kernel of the 'supported' version.

With rocm 5.4.2, I had to install the kernel deb package, linux-oem-22.04 deb package. This will give rocm the 5.17 the package it is expect. Pytorch wants this version too.

With rocm 5.5, things get messier. Last I knew, pytorch only officially supported up to 5.4.2. They haven't added 5.4.3 or 5.5 support officially yet. I'm assuming rocm 5.5 is based off the linux-image-generic-hwe-22.04 deb kernel package. I'm testing it now. Can't say I'm holding my breath here. So we can try mixed versions. Not great, if it helps it could be helpful for people.

We really need my rocm development / testing. It feels like rocm is a second class citizen to cuda.

InkyZima commented 1 year ago

thanks for the info. ill try to spend some more time testing this weekend. Though it might be wise to just wait a few weeks until pytorch+rocm5.5 is out. Related: https://github.com/vladmandic/automatic/discussions/741#discussioncomment-5809102

hydrian commented 1 year ago

I just updated the rocm5.5 branch. That loads the rocm 5.5 deb packages but still uses the SDW 5.4.2rocm build. I haven't had any issues with the mixed version so far.

You can easily build the image by using the command bash build.sh rocm5.5 and deploy it with the standard docker-compose command.

See how it works for you.

InkyZima commented 1 year ago

Hi, thanks for the effort; unfortunately no luck; same NaN error. As a side note (Im sure there's a way to do this, Im just not Docker skilled enough); when wanting to change the COMMANDLINE_ARGS (such as, for example to try and see if it works with --precision full --no-half), i would edit the docker-compose.yml (e.g. uncommenting that env variable), and that would lead to re-download of pytorch (that is 1.5GB of data) on next docker-compose up, which is annoying. I think this could be avoided. Thanks again for the effort.

hydrian commented 1 year ago

That is sort of how the SDW application works. I can't really help that. That's inside the application. When you update the docker-compose, docker redeploys the whole container, so SDW can't find the previous download and thus redownload.

The other option is to make is part of the container image which isn't ideal.