Amd driver 24.10.1 performance drop

patientx / ComfyUI-Zluda

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. Now ZLUDA enhanced for better AMD GPU performance.

GNU General Public License v3.0

146 stars 9 forks source link

Amd driver 24.10.1 performance drop #33

Open xCentral opened 1 week ago

xCentral commented 1 week ago

Your question

Using WSL with ROCM, installed driver (24.10.1 on the windows side), nothing changed except the drivers. I lost nearly 2/3rds of my performance when it came to image generation on my 7900xtx.

The only thing that fixed this was rolling back to the previous drivers. While I understand that the majority of users for this fork prioritize Zluda for generation, still kinda curious if you've noticed a performance decrease.

Images showing diff https://imgur.com/a/RoKfv67

Logs

No response

Other

No response

patientx commented 1 week ago

I don't use wsl so can't comment. Have you tried the normal comfy ? Is it same there?

xCentral commented 6 days ago

Oh, I don't believe the issue to be related to Comfy-UI. It's how style just the drivers, everything else remains fine. I can't imagine why normal Comfy-UI would work any different than this fork. Using > pytorch version: 2.4.1+rocm6.1 I guess it could be a pytorch update thing, however seems unlikely. I was just curious if Zluda was also impact by this update.

patientx commented 6 days ago

most people using this either use it solely on windows or have lower tier gpu's which doesn't work at all with wsl. Or they just use linux natively.

pw405 commented 6 days ago

I use an XTX on Windows - didn't observe any perf drops with 24.10.1, however, have you tried using different command line arg optimizations to see if there is any difference. It is defaulted to : --use-quad-cross-attention but --use-split-cross-attention also works well.

I've noticed odd issues with performance on XTX (actually seeing it with SD 3.5 large now) - Radeon will report ~3,000 Mhz clock speed, but only ~250 watts instead of ~350. Are you seeing that happen by chance?

(I don't know what it means when this happens - but I saw it on SD 1.5 with DirectML, and sometimes with Flux and Zluda).

xCentral commented 6 days ago

I use an XTX on Windows - didn't observe any perf drops with 24.10.1, however, have you tried using different command line arg optimizations to see if there is any difference. It is defaulted to : --use-quad-cross-attention but --use-split-cross-attention also works well.

I've noticed odd issues with performance on XTX (actually seeing it with SD 3.5 large now) - Radeon will report ~3,000 Mhz clock speed, but only ~250 watts instead of ~350. Are you seeing that happen by chance?

(I don't know what it means when this happens - but I saw it on SD 1.5 with DirectML, and sometimes with Flux and Zluda).

Yup, I was primarily using Pony and SDXL when this was happening. -Personally using -use-pytorch-cross-attention which is needed for https://github.com/Beinsezii/comfyui-amd-go-fast That's what has been working the best for me thus far! Hopefully AMD knows what's up.