Open blakejrobinson opened 2 months ago
update and try again
Still occuring in 69ffe37f
Log here just in case it helps:
initial startup: done in 0.023s
prepare environment:
checks: done in 0.008s
git version info: done in 0.091s
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-494-g69ffe37f
Commit hash: 69ffe37f147660f90783d1d39ac9d62d8661cb73
torch GPU test: done in 1.924s
clone repositores: done in 0.097s
run extensions installers:
adetailer: done in 0.148s
sd-webui-lobe-theme: done in 0.001s
sd-webui-pixelart: done in 0.000s
CUDA 12.4
sd-webui-reactor: done in 2.047s
run extensions_builtin installers:
extra-options-section: done in 0.001s
forge_legacy_preprocessors: done in 0.299s
forge_preprocessor_inpaint: done in 0.001s
forge_preprocessor_marigold: done in 0.000s
forge_preprocessor_normalbae: done in 0.000s
forge_preprocessor_recolor: done in 0.000s
forge_preprocessor_reference: done in 0.000s
forge_preprocessor_revision: done in 0.001s
forge_preprocessor_tile: done in 0.000s
forge_space_animagine_xl_31: done in 0.000s
forge_space_birefnet: done in 0.000s
forge_space_example: done in 0.000s
forge_space_florence_2: done in 0.000s
forge_space_geowizard: done in 0.000s
forge_space_iclight: done in 0.000s
forge_space_idm_vton: done in 0.001s
forge_space_illusion_diffusion: done in 0.000s
forge_space_photo_maker_v2: done in 0.000s
forge_space_sapiens_normal: done in 0.000s
mobile: done in 0.000s
prompt-bracket-checker: done in 0.000s
ScuNET: done in 0.000s
sd_forge_controlllite: done in 0.000s
sd_forge_controlnet: done in 0.295s
sd_forge_dynamic_thresholding: done in 0.000s
sd_forge_fooocus_inpaint: done in 0.000s
sd_forge_freeu: done in 0.000s
sd_forge_ipadapter: done in 0.001s
sd_forge_kohya_hrfix: done in 0.000s
sd_forge_latent_modifier: done in 0.000s
sd_forge_lora: done in 0.000s
sd_forge_multidiffusion: done in 0.000s
sd_forge_neveroom: done in 0.001s
sd_forge_perturbed_attention: done in 0.000s
sd_forge_sag: done in 0.000s
sd_forge_stylealign: done in 0.001s
soft-inpainting: done in 0.000s
SwinIR: done in 0.000s
Launching Web UI with arguments: --log-startup --listen
launcher: done in 0.002s
Total VRAM 24564 MB, total RAM 65423 MB
pytorch version: 2.4.0+cu124
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4090 : native
Hint: your device supports --cuda-malloc for potential speed improvements.
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
Using pytorch cross attention
Using pytorch attention for VAE
import torch: done in 6.113s
import torch: done in 0.591s
import gradio: done in 0.000s
setup paths: done in 0.004s
initialize shared: done in 0.138s
other imports: done in 0.572s
Running with TLS
TLS: done in 0.002s
opts onchange: done in 0.000s
setup SD model: done in 0.000s
setup codeformer: done in 0.002s
setup gfpgan: done in 0.010s
set samplers: done in 0.001s
list extensions: done in 0.009s
restore config state file: done in 0.001s
list SD models: done in 0.018s
list localizations: done in 0.002s
load scripts:
custom_code.py: done in 0.009s
img2imgalt.py: done in 0.001s
loopback.py: done in 0.000s
outpainting_mk_2.py: done in 0.001s
poor_mans_outpainting.py: done in 0.000s
postprocessing_codeformer.py: done in 0.000s
postprocessing_focal_crop.py: done in 0.008s
postprocessing_gfpgan.py: done in 0.001s
postprocessing_upscale.py: done in 0.000s
prompt_matrix.py: done in 0.000s
prompts_from_file.py: done in 0.001s
sd_upscale.py: done in 0.000s
xyz_grid.py: done in 0.002s
scunet_model.py: done in 0.414s
swinir_model.py: done in 0.053s
extra_options_section.py: done in 0.001s
legacy_preprocessors.py: done in 0.015s
preprocessor_inpaint.py: done in 0.181s
preprocessor_marigold.py: done in 0.012s
preprocessor_normalbae.py: done in 0.007s
preprocessor_recolor.py: done in 0.000s
forge_reference.py: done in 0.001s
preprocessor_revision.py: done in 0.000s
preprocessor_tile.py: done in 0.001s
forge_controllllite.py: done in 0.012s
ControlNet preprocessor location: F:\Software\AI\stable-diffusion-webui-forge\models\ControlNetPreprocessor
controlnet.py: done in 0.934s
xyz_grid_support.py: done in 0.000s
forge_dynamic_thresholding.py: done in 0.004s
forge_fooocus_inpaint.py: done in 0.001s
forge_freeu.py: done in 0.000s
forge_ipadapter.py: done in 0.007s
kohya_hrfix.py: done in 0.000s
forge_latent_modifier.py: done in 0.004s
lora_script.py: done in 0.358s
forge_multidiffusion.py: done in 0.004s
forge_never_oom.py: done in 0.000s
forge_perturbed_attention.py: done in 0.001s
forge_sag.py: done in 0.000s
forge_stylealign.py: done in 0.001s
soft_inpainting.py: done in 0.000s
[-] ADetailer initialized. version: 24.8.0, num models: 12
!adetailer.py: done in 0.534s
settings.py: done in 0.078s
pixelart.py: done in 0.002s
postprocessing_pixelart.py: done in 0.001s
console_log_patch.py: done in 0.367s
reactor_api.py: done in 0.160s
22:23:02 - ReActor - STATUS - Running v0.7.1-a2 on Device: CUDA
reactor_faceswap.py: done in 0.005s
reactor_globals.py: done in 0.001s
reactor_helpers.py: done in 0.000s
reactor_logger.py: done in 0.001s
reactor_swapper.py: done in 0.001s
reactor_version.py: done in 0.001s
reactor_xyz.py: done in 0.080s
comments.py: done in 0.071s
refiner.py: done in 0.001s
sampler.py: done in 0.000s
seed.py: done in 0.001s
load upscalers: done in 0.005s
refresh VAE: done in 0.003s
scripts list_unets: done in 0.000s
reload hypernetworks: done in 0.005s
initialize extra networks: done in 0.003s
scripts before_ui_callback: done in 0.002s
2024-08-31 22:23:04,070 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'F:\\Software\\AI\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['F:\\Software\\AI\\stable-diffusion-webui-forge\\models\\VAE\\clip_l.safetensors', 'F:\\Software\\AI\\stable-diffusion-webui-forge\\models\\VAE\\t5xxl_fp8_e4m3fn.safetensors', 'F:\\Software\\AI\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
create ui: done in 2.439s
Running on local URL: https://0.0.0.0:7862
To create a public link, set `share=True` in `launch()`.
gradio launch: done in 4.792s
add APIs: done in 0.011s
app_started_callback:
controlnet.py: done in 0.005s
lora_script.py: done in 0.001s
!adetailer.py: done in 0.001s
?? LobeTheme: Initializing...
settings.py: done in 0.004s
reactor_api.py: done in 0.014s
Startup time: 23.0s (prepare environment: 5.0s, import torch: 6.7s, initialize shared: 0.1s, other imports: 0.6s, load scripts: 3.3s, create ui: 2.4s, gradio launch: 4.8s).
Environment vars changed: {'stream': False, 'inference_memory': 0.0, 'pin_shared_memory': False}
Loading Model: {'checkpoint_info': {'filename': 'F:\\Software\\AI\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['F:\\Software\\AI\\stable-diffusion-webui-forge\\models\\VAE\\clip_l.safetensors', 'F:\\Software\\AI\\stable-diffusion-webui-forge\\models\\VAE\\t5xxl_fp8_e4m3fn.safetensors', 'F:\\Software\\AI\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors'], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: gguf
Using pre-quant state dict!
GGUF state dict: {'Q8_0': 304}
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': 'gguf', 'computation_dtype': torch.bfloat16}
Model loaded in 15.8s (unload existing model: 0.2s, forge model load: 15.6s).
[LORA] Loaded F:\Software\AI\stable-diffusion-webui-forge\models\Lora\Styles\flux_Gen_5_Trainer_Sprites.safetensors for KModel-UNet with 304 keys at weight 1.0 (skipped 0 keys) with on_the_fly = False
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
[Unload] Trying to free 6699.54 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 22982.00 MB, Model Require: 5153.49 MB, Previously Loaded: 0.00 MB, Inference Require: 0.00 MB, Remaining: 17828.51 MB, All loaded to GPU.
Moving model(s) has taken 2.39 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 17045.65 MB for cuda:0 with 0 models keep loaded ... Current free memory is 17659.43 MB ... Done.
[Memory Management] Target: KModel, Free GPU: 17659.43 MB, Model Require: 12119.55 MB, Previously Loaded: 0.00 MB, Inference Require: 0.00 MB, Remaining: 5539.88 MB, All loaded to GPU.
Moving model(s) has taken 13.69 seconds
100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 20/20 [00:13<00:00, 1.47it/s]
[Unload] Trying to free 4495.77 MB for cuda:0 with 0 models keep loaded ... Current free memory is 5070.08 MB ... Done.¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 20/20 [00:12<00:00, 1.48it/s]
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 5058.27 MB, Model Require: 159.87 MB, Previously Loaded: 0.00 MB, Inference Require: 0.00 MB, Remaining: 4898.39 MB, All loaded to GPU.
Moving model(s) has taken 0.29 seconds
Total progress: 100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 20/20 [00:13<00:00, 1.47it/s]
Total progress: 100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 20/20 [00:13<00:00, 1.48it/s]```
did you fixed the problem ?
did you fixed the problem ?
Nope, it's still occurring in the latest current commit f40930c5
Since ba01ad37, LoRas loaded in 8bit to the Q8_0 GGUF generate to a poor quality. Loading the LoRa in 16bit appears to fix this issue, but there are subtle differences in the generations from rounding.
This does not seem to happen with FP8 safetensor or with the NF4 - just the Q8_0 GGUF. This does not happen with checkpoint ba01ad37 and earlier.
Example at commit 3b9b2f65:
<lora:flux_Gen_5_Trainer_Sprites:1> A pixelart drawing of a chicken
Diffusion in Low bits set to Automatic:
Diffusion in Low bits set to Automatic (FP16 Lora)
Example at commit ba01ad37 (last testable commit before 8bit LoRa changed):
Diffusion in Low bits set to Automatic:
Diffusion in Low bits set to Automatic (FP16 Lora):
Generations with the FP8 safetensor for comparison:
Diffusion in Low bits set to Automatic:
Diffusion in Low bits set to Automatic (FP16 Lora):
And here is the NF4 for comparison:
Diffusion in Low bits set to Automatic:
Diffusion in Low bits set to Automatic (FP16 Lora):
(LoRa used here was https://civitai.com/models/704779/flux-gen-5-trainer-sprites but appears to happen with all LoRas)