lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
7.43k stars 720 forks source link

Speed degredation when using gguf with CFG >1 #1749

Open BaseMe2 opened 1 week ago

BaseMe2 commented 1 week ago

Recent update, between yesterday and today broke gguf speed on increased CFG values. CFG=1, regular speed as expected. CFG=1.5, for example, generation times 10x slower.

NF4 works fine. Restarted Forge and reproduced it twice. Tested with gguf Q4

aggregate15 commented 1 week ago

you mean CFG or Distilled CFG ? Because CFG in Flux has been like this for me since the beginning when flux came in, changing it really increases generation times. I don't think CFG makes an impact though. Unless you mean Distilled CFG ?

Haoming02 commented 1 week ago

speed degredation when CFG >1

This is true for all models, not just Flux. CFG = 1 basically means ignoring negative prompt. When CFG is not 1, you are doubling the work it needs to do. Depending on your configuration, you can experience the "10x slower" if your VRAM is overfilled for example.

BaseMe2 commented 1 week ago

you mean CFG or Distilled CFG ? Because CFG in Flux has been like this for me since the beginning when flux came in, changing it really increases generation times. I don't think CFG makes an impact though. Unless you mean Distilled CFG ?

CFG, not destilled CFG. 2x times slower, not 10 times slower.

This is true for all models, not just Flux. CFG = 1 basically means ignoring negative prompt. When CFG is not 1, you are doubling the work it needs to do. Depending on your configuration, you can experience the "10x slower" if your VRAM is overfilled for example.

No, please read. 2 times slower was normal before. now its 10 times slower. Same settings as before, where it was only 2x as slow.

Something in the caching process must bug out and prevent running in memory properly. this was not the case prior.

Haoming02 commented 1 week ago

So you are using the exact same settings, and are not running out of VRAM?

BaseMe2 commented 1 week ago

I lowered GPU weights by another 100 MB, now it's almost as fast as before. Now there's 800mb VRAM of my GPU not used but at least it's not running into shared memory any more. Would be nice to not miss out on almost 10% of my GPU. But I don't know how swapping triggers. 800mb seams to be the threshold. Set it to 8600MB of 10GB vram gpu.

edit: further testing revealed when using a lora like "wow_details", I need to lower it to 8500MB, else run into swap. This also means 1GB of my VRAM isn't used. Why is that the case? If I leave it on 8600MB, the VRAM will run completely full and it starts swap.

maraan666 commented 1 week ago

some of your vram needs to be used for other stuff...