kohya-ss / sd-scripts

Apache License 2.0
4.92k stars 824 forks source link

Extracted flux lora not working #1496

Open DarkViewAI opened 3 weeks ago

DarkViewAI commented 3 weeks ago

2 Issues

  1. sometimes it works, but it looks nothing like the fine tuned model.
  2. Doesnt work and i get the error below

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/Ubuntu/apps/stable-diffusion-webui-forge/modules_forge/main_thread.py", line 30, in work self.result = self.func(*self.args, self.kwargs) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/modules/txt2img.py", line 112, in txt2img_function processed = processing.process_images(p) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/modules/processing.py", line 815, in process_images res = process_images_inner(p) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/modules/processing.py", line 958, in process_images_inner samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/modules/processing.py", line 1329, in sample samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x)) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/modules/sd_samplers_kdiffusion.py", line 198, in sample sampling_prepare(self.model_wrap.inner_model.forge_objects.unet, x=x) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/backend/sampling/sampling_function.py", line 380, in sampling_prepare memory_management.load_models_gpu( File "/home/Ubuntu/apps/stable-diffusion-webui-forge/backend/memory_management.py", line 587, in load_models_gpu loaded_model.model_load(model_gpu_memory_when_using_cpu_swap) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/backend/memory_management.py", line 392, in model_load raise e File "/home/Ubuntu/apps/stable-diffusion-webui-forge/backend/memory_management.py", line 387, in model_load self.real_model = self.model.forge_patch_model(patch_model_to) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/backend/patcher/base.py", line 226, in forge_patch_model self.lora_loader.refresh(target_device=target_device, offload_device=self.offload_device) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/backend/patcher/lora.py", line 394, in refresh weight = merge_lora_to_weight(current_patches, weight, key, computation_dtype=torch.float32) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/Ubuntu/apps/stable-diffusion-webui-forge/backend/patcher/lora.py", line 72, in merge_lora_to_weight weight = weight.to(dtype=computation_dtype) torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory. (out of memory) Currently allocated : 45.21 GiB Requested : 252.00 MiB Device limit : 47.44 GiB Free (according to CUDA): 10.25 MiB PyTorch limit (set by user-supplied memory fraction) : 17179869184.00 GiB Allocation on device 0 would exceed allowed memory. (out of memory) Currently allocated : 45.21 GiB Requested : 252.00 MiB Device limit : 47.44 GiB Free (according to CUDA): 10.25 MiB PyTorch limit (set by user-supplied memory fraction) : 17179869184.00 GiB

DarkViewAI commented 3 weeks ago

when it loads.

Loading Model: {'checkpoint_info': {'filename': '/home/Ubuntu/apps/stable-diffusion-webui-forge/models/Stable-diffusion/flux1-dev.safetensors', 'hash': 'b04b3ba1'}, 'additional_modules': ['/home/Ubuntu/apps/stable-diffusion-webui-forge/models/VAE/ae.safetensors', '/home/Ubuntu/apps/stable-diffusion-webui-forge/models/VAE/clip_l.safetensors', '/home/Ubuntu/apps/stable-diffusion-webui-forge/models/VAE/t5xxl_fp16.safetensors'], 'unet_storage_dtype': None} [Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0} Using Default T5 Data Type: torch.float16

DarkViewAI commented 3 weeks ago

when the lora model actually works, it doesn't look like the character i finetuned. the full checkpoint works as normal. but the extracted lora likeness is gone.

prevously with sdxl when i used min diff 0.001 then it would work perfectly

flamed0g commented 3 weeks ago

The extraction is working for me but the result is also not looking like my finetune unfortunately, not even close. The checkpoint itself is fine but the extracted LoRa is not great.

flamed0g commented 3 weeks ago

I also got some blurry results but not sure if this is a LoRa or a Flux issue.

kohya-ss commented 3 weeks ago

As for the error with Forge, I don't know how Forge handles LoRA, so I'm not sure for the reason. It may be improved if you convert LoRA with networks/convert_flux_lora.py.

I will investigate the result of extracted LoRA. What about if you use flux_minimum_inference.py?

DarkViewAI commented 3 weeks ago

Yeah still doesn't work, images are not like the samples in training, or the checkpoint. checkpoint works fine.

the extracted lora images have no likeness to the character and most images are blurry when generated. Tried in comfy,swarm and forge

kohya-ss commented 3 weeks ago

LoRA with rank=32 was extracted from the 'woodblock print' model trained for validation. It seems to be working. Please try increasing the multiplier (application rate) for LoRA.

The images are no application (multiplier=0, same as flux1-dev), multiplier=1, multiplier=1.5, and a fine tuned model.
![mul=0](https://github.com/user-attachments/assets/18fbdfc3-b2be-47aa-bff5-ff5c88eea577) ![mul=1 0](https://github.com/user-attachments/assets/10d10aef-8f1e-4b4c-9bb4-7bdff687da93) ![mul=1 5](https://github.com/user-attachments/assets/b60f6614-1afd-4e47-adff-764a79752828) ![fine_tuned](https://github.com/user-attachments/assets/44298c33-a72f-469a-8382-f87e98ad29a4)
DarkViewAI commented 3 weeks ago

Hi i was using rank 128, so do i just add --multiplier=1.5

DarkViewAI commented 3 weeks ago

ohhh you mean when using the lora i do :1.5

is there a way to implement it into the extract ?

kohya-ss commented 3 weeks ago

Unfortunately scaling in extracting is not supported yet. I think ComfyUI etc. has the setting for LoRA application ratio (multiplier, weight etc.)

flamed0g commented 3 weeks ago

Unfortunately scaling in extracting is not supported yet. I think ComfyUI etc. has the setting for LoRA application ratio (multiplier, weight etc.)

@kohya-ss Just for my understanding, do you mean the strength of the LoRa?

Since for me the LoRa was definitely being applied (there was a difference between 0 and 1) but the likeliness just wasn't the same. But I can definitely try 1.5, like so? :

image

kohya-ss commented 3 weeks ago

@kohya-ss Just for my understanding, do you mean the strength of the LoRa?

Yes I mean. I think you can use 1.5 for strength, and it is worth a try.

DarkViewAI commented 3 weeks ago

works good now at 1.5 strength, thanks @kohya-ss

KujoAI commented 3 weeks ago

@kohya-ss just woke up to some good news, however i noticed some issues, using 1.5 makes the training have an overtrained affect. Would still love a min diff type of effect, hopefully training of clip text encoder soon

flamed0g commented 3 weeks ago

I also have that overtraining effect with my LoRa sadly. 1.5 strength does not seem ideal, maybe for style LoRa's but for a character it's not great.. it also becomes pretty noisy.

DarkViewAI commented 3 weeks ago

works good now at 1.5 strength, thanks @kohya-ss

okay yeah i noticed now my extracted loras have a grainy feeling to them