Open WingeD123 opened 2 months ago
show full log and make sure that you have like
The log will be like
Using Default T5 Data Type: torch.float16
show full log and make sure that you have like
The log will be like
Using Default T5 Data Type: torch.float16
tried again on clean start, things a little different, i tried generate 3 images: img1 with t5fp16, img2 with t5fp8, img3 with t5fp16, setting as below, full log attached. I got img 1 and 2 different but 2 and 3 same, while 1 and 3 using same setting img_log.txt
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-311-g3da7de41
Commit hash: 3da7de418a102a261cebaeba545104903f5cae9d
Launching Web UI with arguments: --theme dark --port 7868 --cuda-malloc --ckpt-dir D:/sd-models/Checkpoint --lora-dir D:/sd-models/Lora
Using cudaMallocAsync backend.
Total VRAM 16380 MB, total RAM 32556 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4060 Ti : cudaMallocAsync
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
D:\sd-webui-forge2\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. Use HF_HOME
instead.
warnings.warn(
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: D:\sd-webui-forge2\webui\models\ControlNetPreprocessor
[-] ADetailer initialized. version: 24.8.0, num models: 10
Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu.
2024-08-17 15:52:44,829 - ControlNet - INFO - ControlNet UI callback registered.
Checkpoint flux00001.safetensors not found; loading fallback cosxl.safetensors
Model selected: {'checkpoint_info': {'filename': 'D:\sd-models\Checkpoint\cosxl.safetensors', 'hash': 'a01919e6'}, 'additional_modules': [], 'unet_storage_dtype': None}
Running on local URL: http://127.0.0.1:7868
To create a public link, set share=True
in launch()
.
Startup time: 17.3s (prepare environment: 3.8s, launcher: 2.2s, import torch: 3.6s, initialize shared: 0.2s, other imports: 1.3s, load scripts: 2.8s, create ui: 2.0s, gradio launch: 1.4s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': True}
Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': [], 'unet_storage_dtype': None}
Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors'], 'unet_storage_dtype': None}
Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors'], 'unet_storage_dtype': None}
Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None}
Loading Model: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None}
[Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ...
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Default T5 Data Type: torch.float16
Using Detected UNet Type: torch.float8_e4m3fn
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.bfloat16}
Model loaded in 0.7s (unload existing model: 0.1s, forge model load: 0.5s).
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model JointTextEncoder
Begin to load 1 model
[Unload] Trying to free 13464.34 MB for cuda:0 with 0 models keep loaded ...
[Memory Management] Current Free GPU Memory: 15227.00 MB
[Memory Management] Required Model Memory: 9569.49 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 4633.51 MB
Moving model(s) has taken 6.90 seconds
Distilled CFG Scale: 3
To load target model KModel
Begin to load 1 model
[Unload] Trying to free 16065.81 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 5539.66 MB ...
[Unload] Unload model JointTextEncoder
[Memory Management] Current Free GPU Memory: 15181.65 MB
[Memory Management] Required Model Memory: 11350.07 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 2807.58 MB
Moving model(s) has taken 19.71 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 25/25 [01:03<00:00, 2.55s/it]
To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 25/25 [01:02<00:00, 2.58s/it]
Begin to load 1 model
[Unload] Trying to free 4563.84 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 3686.01 MB ...
[Unload] Unload model KModel
[Memory Management] Current Free GPU Memory: 15141.40 MB
[Memory Management] Required Model Memory: 159.87 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 13957.52 MB
Moving model(s) has taken 5.95 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 25/25 [01:10<00:00, 2.81s/it]
Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors'], 'unet_storage_dtype': None}
Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
Loading Model: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
[Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 14972.52 MB ...
[Unload] Unload model IntegratedAutoencoderKL
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: torch.float8_e4m3fn
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.bfloat16}
Model loaded in 18.5s (unload existing model: 0.7s, forge model load: 17.7s).
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model JointTextEncoder
Begin to load 1 model
[Unload] Trying to free 7723.54 MB for cuda:0 with 0 models keep loaded ...
[Memory Management] Current Free GPU Memory: 3649.46 MB
[Memory Management] Required Model Memory: 5153.49 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: -2528.03 MB
[Memory Management] Loaded to GPU for backward capability: 73.14 MB
[Memory Management] Loaded to Shared Swap: 3214.00 MB (blocked method)
[Memory Management] Loaded to GPU: 2011.87 MB
Moving model(s) has taken 8.13 seconds
Distilled CFG Scale: 3
To load target model KModel
Begin to load 1 model
[Unload] Trying to free 1310.72 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 1244.04 MB ...
[Unload] Unload model JointTextEncoder
[Memory Management] Current Free GPU Memory: 3783.33 MB
[Memory Management] Required Model Memory: 0.00 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 2759.33 MB
Moving model(s) has taken 0.76 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 25/25 [01:03<00:00, 2.53s/it]
To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 25/25 [01:02<00:00, 2.62s/it]
Begin to load 1 model
[Unload] Trying to free 4563.84 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 3648.20 MB ...
[Unload] Unload model KModel
[Memory Management] Current Free GPU Memory: 15131.40 MB
[Memory Management] Required Model Memory: 159.87 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 13947.52 MB
Moving model(s) has taken 17.59 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 25/25 [01:21<00:00, 3.25s/it]
Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': [], 'unet_storage_dtype': None}
Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors'], 'unet_storage_dtype': None}
Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors'], 'unet_storage_dtype': None}
Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None}
Loading Model: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None}
[Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 14972.52 MB ...
[Unload] Unload model IntegratedAutoencoderKL
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Default T5 Data Type: torch.float16
Using Detected UNet Type: torch.float8_e4m3fn
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.bfloat16}
Model loaded in 22.6s (unload existing model: 0.9s, forge model load: 21.7s).
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model KModel
Begin to load 1 model
[Unload] Trying to free 1310.72 MB for cuda:0 with 0 models keep loaded ...
[Memory Management] Current Free GPU Memory: 3520.33 MB
[Memory Management] Required Model Memory: 0.00 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 2496.33 MB
Moving model(s) has taken 0.01 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 25/25 [01:03<00:00, 2.54s/it]
To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 25/25 [01:02<00:00, 2.58s/it]
Begin to load 1 model
[Unload] Trying to free 4563.84 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 3635.24 MB ...
[Unload] Unload model KModel
[Memory Management] Current Free GPU Memory: 15135.62 MB
[Memory Management] Required Model Memory: 159.87 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 13951.75 MB
Moving model(s) has taken 13.07 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 25/25 [01:17<00:00, 3.08s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 25/25 [01:17<00:00, 2.58s/it]
if different then it works.
if you are worried about underlying things you can use this to test if you get same images from elsewhere with fp16 T5
but I do not recommend it since later Forge will try to perfectly reproduce results from Black Forest Lab. Currently no software can do that as far as I know.
if different then it works.
if you are worried about underlying things you can use this to test if you get same images from elsewhere with fp16 T5
but I do not recommend it since later Forge will try to perfectly reproduce results from Black Forest Lab. Currently no software can do that as far as I know.
different but not as expected. once used t5fp8, i can't get t5fp16 back unless restart. sorry for the words i said which is not proper. Forge is my favorite UI and will always
i can't get t5fp16 back unless restart.
I do not think that is even possible in Forge architecture. Forge never reuse module or reload weights like A1111.
Show full log as evidence for t5fp16 not getting back.
i can't get t5fp16 back unless restart.
I do not think that is even possible in Forge architecture. Forge never reuse module or reload weights like A1111.
Show full log as evidence for t5fp16 not getting back.
log show no evidence, but the images generated shows, in my last case, I use t5fp16 in img1 & 3, but img2(t5 fp8) & 3 are same
tried start with t5fp16 and never change it, every image generates well
I tried many unet setting like dev-fp16 with automatic Diffusion in Low Bits/ dev-fp16 with fp8e4m3fn in low bits/ dev-fp8_e4m3fn with automatic Diffusion in Low Bits. but for every single unet setting, i got same images with t5-fp8 and t5-fp16. guess T5 is always cast to fp8?