lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
8.34k stars 811 forks source link

Ouestion: FLUX LORA and none LORA SPEED #1739

Open kalle07 opened 2 months ago

kalle07 commented 2 months ago

(rtx 4060) ok i tryed booth with and without with different sampler and schedulers (resolution 1024x768) if i generate an image with a lora speed is ~ 3s/it if i choose same conditions without that ~ 1.5s/it

is that explanable?

and the power consumption is also around 70%, not fully ! but its is not working on bus interface thats sometimes reduce the full GPU calculation speed. i see that in windows with GPU-Z, i allways check with that my GPU

lllyasviel commented 2 months ago

do you have full logs?

kalle07 commented 2 months ago

first without, second with one lora, (its a bit faster i mentioned, but if i use 2 loras its 3s/it)

To create a public link, set share=True in launch(). Startup time: 24.4s (prepare environment: 6.5s, import torch: 7.6s, initialize shared: 0.3s, other imports: 0.6s, load scripts: 3.9s, create ui: 2.9s, gradio launch: 2.3s). Environment vars changed: {'stream': False, 'inference_memory': 4379.0, 'pin_shared_memory': False} [GPU Setting] You will use 73.26% GPU memory (12000.00 MB) to load weights, and use 26.74% GPU memory (4379.00 MB) to do matrix computation. Model selected: {'checkpoint_info': {'filename': 'E:\WebUI_Forge\webui\models\Stable-diffusion\flux1DevV1V2Flux1_flux1DevBNBNF4V2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': 'nf4'} Using online LoRAs in FP16: True Model selected: {'checkpoint_info': {'filename': 'E:\WebUI_Forge\webui\models\Stable-diffusion\flux1DevV1V2Flux1_flux1DevBNBNF4V2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': 'nf4'} Using online LoRAs in FP16: True Loading Model: {'checkpoint_info': {'filename': 'E:\WebUI_Forge\webui\models\Stable-diffusion\flux1DevV1V2Flux1_flux1DevBNBNF4V2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': 'nf4'} [Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done. StateDict Keys: {'transformer': 1722, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0} Using Detected T5 Data Type: torch.float8_e4m3fn Working with z of shape (1, 16, 32, 32) = 16384 dimensions. K-Model Created: {'storage_dtype': 'nf4', 'computation_dtype': torch.bfloat16} Model loaded in 1.4s (unload existing model: 0.2s, forge model load: 1.2s). Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored. [Unload] Trying to free 11080.00 MB for cuda:0 with 0 models keep loaded ... Done. [Memory Management] Target: JointTextEncoder, Free GPU: 15213.00 MB, Model Require: 5154.62 MB, Previously Loaded: 0.00 MB, Inference Require: 4379.00 MB, Remaining: 5679.38 MB, All loaded to GPU. Moving model(s) has taken 4.99 seconds Distilled CFG Scale: 3.5 [Unload] Trying to free 12499.89 MB for cuda:0 with 0 models keep loaded ... Current free memory is 9932.75 MB ... Unload model JointTextEncoder Done. [Memory Management] Target: KModel, Free GPU: 15167.36 MB, Model Require: 6246.84 MB, Previously Loaded: 0.00 MB, Inference Require: 4379.00 MB, Remaining: 4541.51 MB, All loaded to GPU. Moving model(s) has taken 9.17 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:31<00:00, 1.56s/it] [Unload] Trying to free 4586.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 8657.97 MB ... Done. [Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 8657.97 MB, Model Require: 159.87 MB, Previously Loaded: 0.00 MB, Inference Require: 4379.00 MB, Remaining: 4119.09 MB, All loaded to GPU. Moving model(s) has taken 0.69 seconds Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:31<00:00, 1.58s/it] [LORA] Loaded E:\WebUI_Forge\webui\models\Lora\flux\JeriR07_02_FLUX_sevenof9_adam.safetensors for KModel-UNet with 304 keys at weight 1.0 (skipped 0 keys) with on_the_fly = True Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored. [Unload] Trying to free 11174.24 MB for cuda:0 with 0 models keep loaded ... Current free memory is 8484.75 MB ... Unload model KModel Current free memory is 14970.68 MB ... Done. [Memory Management] Target: JointTextEncoder, Free GPU: 14970.68 MB, Model Require: 5227.11 MB, Previously Loaded: 0.00 MB, Inference Require: 4379.00 MB, Remaining: 5364.58 MB, All loaded to GPU. Moving model(s) has taken 3.02 seconds Distilled CFG Scale: 3.5 [Unload] Trying to free 12518.10 MB for cuda:0 with 0 models keep loaded ... Current free memory is 9733.85 MB ... Unload model IntegratedAutoencoderKL Current free memory is 9897.73 MB ... Unload model JointTextEncoder Done. [Memory Management] Target: KModel, Free GPU: 15129.90 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 4397.26 MB, Remaining: 4485.84 MB, All loaded to GPU. Moving model(s) has taken 2.76 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:47<00:00, 2.36s/it] [Unload] Trying to free 4586.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 8641.32 MB ... Done. [Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 8641.32 MB, Model Require: 159.87 MB, Previously Loaded: 0.00 MB, Inference Require: 4379.00 MB, Remaining: 4102.45 MB, All loaded to GPU. Moving model(s) has taken 0.10 seconds Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:47<00:00, 2.36s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:47<00:00, 2.39s/it]

kalle07 commented 1 month ago

i know if cfg is activated, not -1 so i can write negative prompt thats double the time (that seems usual)