lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
7.33k stars 710 forks source link

Offline LoRA patching #1473

Open stefanasandei opened 2 weeks ago

stefanasandei commented 2 weeks ago

I have an RTX 2060 gpu with 6gb vram. The generation speed is good (~2 minutes for 25 steps), I use the NF4 checkpoint. The big issue is regarding LoRA patching, it takes way more time before the generation starts and most of the time it crashes ForgeUI. I wonder, can this patching of loras be done offline, before we start forge? I mean, like saving the patched lora to a new file and use that one in the ui. I don't see the point in patching the same combination of loras over and over again (unless there is a technical issue I'm not aware of).

Basically, what I'm asking for is a script that patches a couple of loras before starting the ui, so when forge starts generating, it doesn't need to waste time with patching loras again for n4. If this is doable and the maintainers agree it can be useful, I can offer to code it myself.

lllyasviel commented 2 weeks ago

you can skip the patching if you have already read the link in Readme Quicklist.https://github.com/lllyasviel/stable-diffusion-webui-forge?tab=readme-ov-file#quick-list Let me know if you still want to save merged model even after you know you can skip patching.

bnelsey commented 2 weeks ago

you can skip the patching if you have already read the link in Readme Quicklist.https://github.com/lllyasviel/stable-diffusion-webui-forge?tab=readme-ov-file#quick-list Let me know if you still want to save merged model even after you know you can skip patching.

Yes I definitely still want it - my use case is really specific since I only use one specific style lora all the time While I can skip "patching loras" by setting Automatic FP16 lora, the resulting images come out relatively terrible when compared to the results from patching loras *this is from the same seed and all other settings ex:

  1. set a seed of 9001 and Automatic FP16 lora
  2. generate 10 pics, save results
  3. change Automatic FP16 lora to just Automatic
  4. generate 10 pics, wait for lora to patch, save results
  5. be sad that step 4 generated significantly better images + finish faster

As noted in step 5, lora patching gives better images and finish faster - so why do I want to avoid it I only have 8gb vram and 24GB ram, which means:

  1. patching is slow at 1.6it/s since it won't use the GPU at all and my cpu is slow
  2. patching will fill up my entire 24gb ram and offload some of it to the pagefile on my SSD

I'm guessing that FP16 lora doesn't work too well with Flux Dev NF4 which is why I'm getting pretty bad images when not doing the lora patch thing I imagine I don't need to post comparison pics because FP16 lora + Flux Dev NF4 gives me SD 1.5 level results, it's pretty bad I am fine with the FP16 lora speed penalty since I get to avoid hitting the pagefile, but the quality penalty is pretty much unbearable