Closed SeBL4RD closed 1 month ago
You can see your VRAM usage in the Task Manager (performance tab). That'll show if it's related to running out of ram
You can see your VRAM usage in the Task Manager (performance tab). That'll show if it's related to running out of ram
I constantly monitor my Ram/Vram, and nothing changes between the 2, consumption remains the same, and I have half my ram free as you can see.
I can't use the Hires Fix because of this, because to go from 1152x896 > x1.55 = 1785x1388, it takes 35s/it, whereas when it doesn't bug I can generate in 1080p at 3.7s/it... its a non sense
Does that say 13gb VRAM used? If so, that looks like far too little VRAM to have the Q8 flux, Q8 T5, Clip and VAE loaded in VRAM. Mine, for example, takes up 18-19GB.
If Forge detects you don't have enough VRAM, it will swap things in and out of it which takes time.
Sometimes after one or more forced generation stops, it/s become normal again ... ???
How much VRAM does the 3080ti have? Flux Q8 and T5 Q8 need around 18GB free or it will swap things back and forth between VRAM/RAM between generations which can slow things. You might need the NF4 if you have less VRAM than that for consistent speed/memory.
How much VRAM does the 3080ti have? Flux Q8 and T5 Q8 need around 18GB free or it will swap things back and forth between VRAM/RAM between generations which can slow things. You might need the NF4 if you have less VRAM than that for consistent speed/memory.
12GB, but then why does it sometimes work so well and sometimes not?
It's exactly the same with Q4_K_S and t5 Q4_K_S, there's got to be a problem somewhere.
Forge does not free all the VRAM or use the freed VRAM after releasing it after 1st generation that why, I'm also try experience this with the latest update of Forge. Due to lag of VRAM the second generation will be out of memory and use RAM to replace which make gen speed 3-4 times slower.
Hi do you have full console logs
Hi do you have full console logs
I could make you one. I'll do it as soon as possible.
Hi do you have full console logs
Full log : https://pastebin.com/JaF3wGa5
As you can see, the 1st image generate normally (3,6s/it) Second one is slower, and stabilize on 9s/it, I interrupt. 3rd image come back to 3,6s/it
Etc
- Are you using a PC with both HDD and SSD?
- what will happen if you do not use lora
Yeah, i have 3 SSDs and 2 HDD, 970 evo+ 1TB, 860 Evo 500 GB, 870 evo 1 TB, Barracuda 1 TB, WD blue 2 TB. I don't do sh*t like pagefile.sys and others paginations on HDD, if its the question. Only C (SSD) have.
I will try without LoRA
I added some possible fix, update, try again, and put full console log, if possible without lora
I added some possible fix, update, try again, and put full console log, if possible without lora
(Before your update) : Hmm, actually, without LoRA, I don't have this problem. I've done 5 in a row at 3.6s/it.
I will update and tell you
By the way another person is solve https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/1630 hopefully your side can also get some luck
Oh wait if you do not have problem when not using loras then it is completely different problem.
You should try lower "GPU weights" a bit, try again, and you will be able to find a value for that lora that works in 100% cases, and then tell me the number that works
Oh wait if you do not have problem when not using loras then it is completely different problem.
You should try lower "GPU weights" a bit, try again, and you will be able to find a value for that lora that works in 100% cases, and then tell me the number that works
It seems to have worked. I know my LoRA is less than 200 MB, I removed 200 MB and it's fine, I get 3.6s/it all the time. Thanks :) !
so the number is 200MB?
so the number is 200MB?
I can only say that in my case, it worked x)
update and try again, this time you should not need to drop that 200MB
update and try again, this time you should not need to drop that 200MB
seems working well
Good
Good
Thanks again! I'll be able to do without nf4 :)
I don't understand why, I imagine it's perhaps due to a bad ram/Vram purge? But when I use flux1-dev-Q8_0.gguf + t5-v1_1-xxl-encoder-Q8_0.gguf, the first generation takes about 3.75s/it, but when with the same settings, without changing anything, I try a 2nd generation, I go to 15.35s/it....
It's a pity, because the 1st generation, in 1920x1080, has good results. And is very close to Flux1dev + txxl fp16.
3700x, 32gigs ram, 3080ti, 970evo+, Windows 10. Latest version of Forge.