Open kalle07 opened 2 months ago
do you have full logs?
first without, second with one lora, (its a bit faster i mentioned, but if i use 2 loras its 3s/it)
To create a public link, set share=True
in launch()
.
Startup time: 24.4s (prepare environment: 6.5s, import torch: 7.6s, initialize shared: 0.3s, other imports: 0.6s, load scripts: 3.9s, create ui: 2.9s, gradio launch: 2.3s).
Environment vars changed: {'stream': False, 'inference_memory': 4379.0, 'pin_shared_memory': False}
[GPU Setting] You will use 73.26% GPU memory (12000.00 MB) to load weights, and use 26.74% GPU memory (4379.00 MB) to do matrix computation.
Model selected: {'checkpoint_info': {'filename': 'E:\WebUI_Forge\webui\models\Stable-diffusion\flux1DevV1V2Flux1_flux1DevBNBNF4V2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': 'nf4'}
Using online LoRAs in FP16: True
Model selected: {'checkpoint_info': {'filename': 'E:\WebUI_Forge\webui\models\Stable-diffusion\flux1DevV1V2Flux1_flux1DevBNBNF4V2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': 'nf4'}
Using online LoRAs in FP16: True
Loading Model: {'checkpoint_info': {'filename': 'E:\WebUI_Forge\webui\models\Stable-diffusion\flux1DevV1V2Flux1_flux1DevBNBNF4V2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': 'nf4'}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'transformer': 1722, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': 'nf4', 'computation_dtype': torch.bfloat16}
Model loaded in 1.4s (unload existing model: 0.2s, forge model load: 1.2s).
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
[Unload] Trying to free 11080.00 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 15213.00 MB, Model Require: 5154.62 MB, Previously Loaded: 0.00 MB, Inference Require: 4379.00 MB, Remaining: 5679.38 MB, All loaded to GPU.
Moving model(s) has taken 4.99 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 12499.89 MB for cuda:0 with 0 models keep loaded ... Current free memory is 9932.75 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 15167.36 MB, Model Require: 6246.84 MB, Previously Loaded: 0.00 MB, Inference Require: 4379.00 MB, Remaining: 4541.51 MB, All loaded to GPU.
Moving model(s) has taken 9.17 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:31<00:00, 1.56s/it]
[Unload] Trying to free 4586.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 8657.97 MB ... Done.
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 8657.97 MB, Model Require: 159.87 MB, Previously Loaded: 0.00 MB, Inference Require: 4379.00 MB, Remaining: 4119.09 MB, All loaded to GPU.
Moving model(s) has taken 0.69 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:31<00:00, 1.58s/it]
[LORA] Loaded E:\WebUI_Forge\webui\models\Lora\flux\JeriR07_02_FLUX_sevenof9_adam.safetensors for KModel-UNet with 304 keys at weight 1.0 (skipped 0 keys) with on_the_fly = True
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
[Unload] Trying to free 11174.24 MB for cuda:0 with 0 models keep loaded ... Current free memory is 8484.75 MB ... Unload model KModel Current free memory is 14970.68 MB ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 14970.68 MB, Model Require: 5227.11 MB, Previously Loaded: 0.00 MB, Inference Require: 4379.00 MB, Remaining: 5364.58 MB, All loaded to GPU.
Moving model(s) has taken 3.02 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 12518.10 MB for cuda:0 with 0 models keep loaded ... Current free memory is 9733.85 MB ... Unload model IntegratedAutoencoderKL Current free memory is 9897.73 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 15129.90 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 4397.26 MB, Remaining: 4485.84 MB, All loaded to GPU.
Moving model(s) has taken 2.76 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:47<00:00, 2.36s/it]
[Unload] Trying to free 4586.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 8641.32 MB ... Done.
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 8641.32 MB, Model Require: 159.87 MB, Previously Loaded: 0.00 MB, Inference Require: 4379.00 MB, Remaining: 4102.45 MB, All loaded to GPU.
Moving model(s) has taken 0.10 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:47<00:00, 2.36s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:47<00:00, 2.39s/it]
i know if cfg is activated, not -1 so i can write negative prompt thats double the time (that seems usual)
(rtx 4060) ok i tryed booth with and without with different sampler and schedulers (resolution 1024x768) if i generate an image with a lora speed is ~ 3s/it if i choose same conditions without that ~ 1.5s/it
is that explanable?
and the power consumption is also around 70%, not fully ! but its is not working on bus interface thats sometimes reduce the full GPU calculation speed. i see that in windows with GPU-Z, i allways check with that my GPU