Custom Lora the long generation does not work either

konservat0r commented 3 weeks ago

I’m reaching out to report an issue with image generation when using a custom 164 MB Lora that I created with Replicate, in conjunction with the Flux model. The generation time is significantly longer, ranging from 20 to 40 minutes per image. Moreover, it appears that the Lora is not being applied correctly, as the results are markedly different from what I would expect.

For comparison, when using a Lora downloaded from Civitai, the generation time is much shorter—around 1.5 minutes—and the results are as expected. I also tested the custom Lora on Hugging Face, where it performed as anticipated.

This discrepancy is impacting my workflow, and I would appreciate any guidance or solutions you can provide to address this issue.

Please let me know if you need any additional information or if there are steps I should take to resolve this.

version: f2.0.1v1.10.1-previous-401-g08f74875 • python: 3.10.6 • torch: 2.4.0+cu124 • xformers: N/A • gradio: 4.40.0 • checkpoint: c161224931

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: f2.0.1v1.10.1-previous-401-g08f74875 Commit hash: 08f7487590d2b48ae53098a4247f4d067549b0c9 Launching Web UI with arguments: --share Total VRAM 12288 MB, total RAM 32612 MB pytorch version: 2.4.0+cu124 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 3060 : native Hint: your device supports --cuda-malloc for potential speed improvements. VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16 CUDA Using Stream: False G:\webui_forge_cu124_torch24\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead. warnings.warn( Using pytorch cross attention Using pytorch attention for VAE ControlNet preprocessor location: G:\webui_forge_cu124_torch24\webui\models\ControlNetPreprocessor 2024-08-22 18:03:38,547 - ControlNet - INFO - ControlNet UI callback registered. Model selected: {'checkpoint_info': {'filename': 'G:\webui_forge_cu124_torch24\webui\models\Stable-diffusion\flux1-dev-bnb-nf4.safetensors', 'hash': '0184473b'}, 'additional_modules': [], 'unet_storage_dtype': None} Using online LoRAs in FP16: False Running on local URL: http://127.0.0.1:7860

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from Terminal to deploy to Spaces (https://huggingface.co/spaces) Startup time: 26.2s (prepare environment: 5.3s, import torch: 10.4s, initialize shared: 0.3s, other imports: 0.8s, load scripts: 2.5s, create ui: 3.5s, gradio launch: 3.3s). Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False} [GPU Setting] You will use 91.67% GPU memory (11263.00 MB) to load weights, and use 8.33% GPU memory (1024.00 MB) to do matrix computation. Loading Model: {'checkpoint_info': {'filename': 'G:\webui_forge_cu124_torch24\webui\models\Stable-diffusion\flux1-dev-bnb-nf4.safetensors', 'hash': '0184473b'}, 'additional_modules': [], 'unet_storage_dtype': None} [Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... StateDict Keys: {'transformer': 2350, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0} Using Detected T5 Data Type: torch.float8_e4m3fn Using Detected UNet Type: nf4 Using pre-quant state dict! Working with z of shape (1, 16, 32, 32) = 16384 dimensions. K-Model Created: {'storage_dtype': 'nf4', 'computation_dtype': torch.bfloat16} Model loaded in 2.3s (unload existing model: 0.2s, forge model load: 2.1s). [LORA] Loaded G:\webui_forge_cu124_torch24\webui\models\Lora\tok.safetensors for KModel-UNet with 494 keys at weight 0.94 (skipped 0 keys) Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored. To load target model JointTextEncoder Begin to load 1 model [Unload] Trying to free 7725.00 MB for cuda:0 with 0 models keep loaded ... [Memory Management] Current Free GPU Memory: 11235.00 MB [Memory Management] Required Model Memory: 5154.62 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 5056.38 MB Moving model(s) has taken 6.92 seconds Distilled CFG Scale: 3.5 To load target model KModel Begin to load 1 model [Unload] Trying to free 9411.13 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 5956.43 MB ... [Unload] Unload model JointTextEncoder [Memory Management] Current Free GPU Memory: 11191.04 MB [Memory Management] Required Model Memory: 6246.84 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 3920.19 MB Patching LoRAs for KModel: 100%|█████████████████████████████████████████████████████| 304/304 [00:42<00:00, 7.20it/s] LoRA patching has taken 42.20 seconds Moving model(s) has taken 46.29 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [07:58<00:00, 23.90s/it] To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 20/20 [07:38<00:00, 24.98s/it] Begin to load 1 model [Unload] Trying to free 4495.77 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 19691.11 MB ... [Memory Management] Current Free GPU Memory: 10137.11 MB [Memory Management] Required Model Memory: 159.87 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 8953.23 MB Moving model(s) has taken 0.92 seconds Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [07:41<00:00, 23.09s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [07:41<00:00, 24.98s/it]

lllyasviel commented 3 weeks ago

Do not expose gradio sharing URLs in public.

Try the below two things:

Reduce "GPU Weights" by 2GB
In "Diffusion with Low Bits" select "Automatic (fp16 LoRA)"

and then report back which one solved your problem (or both).

If you use (1), then try to increase "GPU Weights" slowly and try several times, and tell me what number will make things broken

lllyasviel commented 3 weeks ago

And, update to latest

konservat0r commented 3 weeks ago

Thank you very much! Option 2 turned out to be working, Patching LoRAs for KModel: less than 1 second and generation begins immediately, in time, the same as with other loras. I tried to reduce the memory, but it didn't work.

lllyasviel commented 3 weeks ago

update to latest and try (1)

konservat0r commented 3 weeks ago

update to latest and try (1)

There is still a 2-way solution to the problem, I tried different memory options, but the computer starts to freeze. in automatic mode, "Diffusion with Low Bits" does Patching LoRAs for KModel for about 73 seconds.

lllyasviel commented 3 weeks ago

After latest updates, you should be able to generate at normal speed after patching LoRAs. If not, let me know

konservat0r commented 3 weeks ago

After latest updates, you should be able to generate at normal speed after patching LoRAs. If not, let me know

unfortunately, it does not work in automatic mode and Laura does not work, the image does not look like it at all. Here's what happens in the log

Patching LoRAs for KModel: 100%|█████████████████████████████████████████████████████| 304/304 [01:51<00:00, 2.74it/s] LoRA patching has taken 111.17 seconds [Memory Management] Loaded to CPU Swap: 3537.45 MB (blocked method) [Memory Management] Loaded to GPU: 2709.31 MB Moving model(s) has taken 151.52 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:33<00:00, 4.70s/it] To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 20/20 [01:18<00:00, 4.08s/it] Begin to load 1 model [Unload] Trying to free 4495.77 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 2197.23 MB ... [Unload] Unload model KModel [Memory Management] Current Free GPU Memory: 4831.34 MB [Memory Management] Required Model Memory: 159.87 MB [Memory Management] Required Inference Memory: 1144.00 MB [Memory Management] Estimated Remaining GPU Memory: 3527.46 MB Moving model(s) has taken 1.53 seconds Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:21<00:00, 4.09s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:21<00:00, 4.08s/it]

konservat0r commented 3 weeks ago

today I updated the assembly and the generation became longer together 1.5 minutes, it began to take 2.30 minutes. The settings are the same.

konservat0r commented 3 weeks ago

In automatic mode, Lora k patching time was reduced to 36 seconds and generation became 3 min. 4.1sec.

lllyasviel / stable-diffusion-webui-forge

Custom Lora the long generation does not work either #1408