lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
8.64k stars 854 forks source link

can't run t5 in fp16? #1222

Open WingeD123 opened 3 months ago

WingeD123 commented 3 months ago

I tried many unet setting like dev-fp16 with automatic Diffusion in Low Bits/ dev-fp16 with fp8e4m3fn in low bits/ dev-fp8_e4m3fn with automatic Diffusion in Low Bits. but for every single unet setting, i got same images with t5-fp8 and t5-fp16. guess T5 is always cast to fp8?

lllyasviel commented 3 months ago

show full log and make sure that you have like

image

The log will be like

Using Default T5 Data Type: torch.float16
WingeD123 commented 3 months ago

show full log and make sure that you have like

image

The log will be like

Using Default T5 Data Type: torch.float16

tried again on clean start, things a little different, i tried generate 3 images: img1 with t5fp16, img2 with t5fp8, img3 with t5fp16, setting as below, full log attached. I got img 1 and 2 different but 2 and 3 same, while 1 and 3 using same setting img1left img1up img2up img3up img_log.txt 00119-flux1-dev-fp8_e4m3fn-2020195062 00120-flux1-dev-fp8_e4m3fn-2020195062 00121-flux1-dev-fp8_e4m3fn-2020195062

WingeD123 commented 3 months ago

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: f2.0.1v1.10.1-previous-311-g3da7de41 Commit hash: 3da7de418a102a261cebaeba545104903f5cae9d Launching Web UI with arguments: --theme dark --port 7868 --cuda-malloc --ckpt-dir D:/sd-models/Checkpoint --lora-dir D:/sd-models/Lora Using cudaMallocAsync backend. Total VRAM 16380 MB, total RAM 32556 MB pytorch version: 2.3.1+cu121 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 4060 Ti : cudaMallocAsync VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16 CUDA Using Stream: False D:\sd-webui-forge2\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead. warnings.warn( Using pytorch cross attention Using pytorch attention for VAE ControlNet preprocessor location: D:\sd-webui-forge2\webui\models\ControlNetPreprocessor [-] ADetailer initialized. version: 24.8.0, num models: 10 Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu. 2024-08-17 15:52:44,829 - ControlNet - INFO - ControlNet UI callback registered. Checkpoint flux00001.safetensors not found; loading fallback cosxl.safetensors Model selected: {'checkpoint_info': {'filename': 'D:\sd-models\Checkpoint\cosxl.safetensors', 'hash': 'a01919e6'}, 'additional_modules': [], 'unet_storage_dtype': None} Running on local URL: http://127.0.0.1:7868

To create a public link, set share=True in launch(). Startup time: 17.3s (prepare environment: 3.8s, launcher: 2.2s, import torch: 3.6s, initialize shared: 0.2s, other imports: 1.3s, load scripts: 2.8s, create ui: 2.0s, gradio launch: 1.4s). Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': True} Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': [], 'unet_storage_dtype': None} Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors'], 'unet_storage_dtype': None} Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors'], 'unet_storage_dtype': None} Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None} Loading Model: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None} [Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ... StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0} Using Default T5 Data Type: torch.float16 Using Detected UNet Type: torch.float8_e4m3fn Working with z of shape (1, 16, 32, 32) = 16384 dimensions. K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.bfloat16} Model loaded in 0.7s (unload existing model: 0.1s, forge model load: 0.5s). Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored. To load target model JointTextEncoder Begin to load 1 model [Unload] Trying to free 13464.34 MB for cuda:0 with 0 models keep loaded ... [Memory Management] Current Free GPU Memory: 15227.00 MB [Memory Management] Required Model Memory: 9569.49 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 4633.51 MB Moving model(s) has taken 6.90 seconds Distilled CFG Scale: 3 To load target model KModel Begin to load 1 model [Unload] Trying to free 16065.81 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 5539.66 MB ... [Unload] Unload model JointTextEncoder [Memory Management] Current Free GPU Memory: 15181.65 MB [Memory Management] Required Model Memory: 11350.07 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 2807.58 MB Moving model(s) has taken 19.71 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 25/25 [01:03<00:00, 2.55s/it] To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 25/25 [01:02<00:00, 2.58s/it] Begin to load 1 model [Unload] Trying to free 4563.84 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 3686.01 MB ... [Unload] Unload model KModel [Memory Management] Current Free GPU Memory: 15141.40 MB [Memory Management] Required Model Memory: 159.87 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 13957.52 MB Moving model(s) has taken 5.95 seconds Total progress: 100%|██████████████████████████████████████████████████████████████████| 25/25 [01:10<00:00, 2.81s/it] Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors'], 'unet_storage_dtype': None} Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None} Loading Model: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None} [Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 14972.52 MB ... [Unload] Unload model IntegratedAutoencoderKL StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0} Using Detected T5 Data Type: torch.float8_e4m3fn Using Detected UNet Type: torch.float8_e4m3fn Working with z of shape (1, 16, 32, 32) = 16384 dimensions. K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.bfloat16} Model loaded in 18.5s (unload existing model: 0.7s, forge model load: 17.7s). Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored. To load target model JointTextEncoder Begin to load 1 model [Unload] Trying to free 7723.54 MB for cuda:0 with 0 models keep loaded ... [Memory Management] Current Free GPU Memory: 3649.46 MB [Memory Management] Required Model Memory: 5153.49 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: -2528.03 MB [Memory Management] Loaded to GPU for backward capability: 73.14 MB [Memory Management] Loaded to Shared Swap: 3214.00 MB (blocked method) [Memory Management] Loaded to GPU: 2011.87 MB Moving model(s) has taken 8.13 seconds Distilled CFG Scale: 3 To load target model KModel Begin to load 1 model [Unload] Trying to free 1310.72 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 1244.04 MB ... [Unload] Unload model JointTextEncoder [Memory Management] Current Free GPU Memory: 3783.33 MB [Memory Management] Required Model Memory: 0.00 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 2759.33 MB Moving model(s) has taken 0.76 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 25/25 [01:03<00:00, 2.53s/it] To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 25/25 [01:02<00:00, 2.62s/it] Begin to load 1 model [Unload] Trying to free 4563.84 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 3648.20 MB ... [Unload] Unload model KModel [Memory Management] Current Free GPU Memory: 15131.40 MB [Memory Management] Required Model Memory: 159.87 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 13947.52 MB Moving model(s) has taken 17.59 seconds Total progress: 100%|██████████████████████████████████████████████████████████████████| 25/25 [01:21<00:00, 3.25s/it] Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': [], 'unet_storage_dtype': None} Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors'], 'unet_storage_dtype': None} Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors'], 'unet_storage_dtype': None} Model selected: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None} Loading Model: {'checkpoint_info': {'filename': 'D:\sd-webui-forge2\webui\models\Stable-diffusion\Unet\FLUX\flux1-dev-fp8_e4m3fn.safetensors', 'hash': '26acbda5'}, 'additional_modules': ['D:\sd-webui-forge2\webui\models\VAE\flux-ae.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\clip_l.safetensors', 'D:\sd-webui-forge2\webui\models\text_encoder\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None} [Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 14972.52 MB ... [Unload] Unload model IntegratedAutoencoderKL StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0} Using Default T5 Data Type: torch.float16 Using Detected UNet Type: torch.float8_e4m3fn Working with z of shape (1, 16, 32, 32) = 16384 dimensions. K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.bfloat16} Model loaded in 22.6s (unload existing model: 0.9s, forge model load: 21.7s). Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored. To load target model KModel Begin to load 1 model [Unload] Trying to free 1310.72 MB for cuda:0 with 0 models keep loaded ... [Memory Management] Current Free GPU Memory: 3520.33 MB [Memory Management] Required Model Memory: 0.00 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 2496.33 MB Moving model(s) has taken 0.01 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 25/25 [01:03<00:00, 2.54s/it] To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 25/25 [01:02<00:00, 2.58s/it] Begin to load 1 model [Unload] Trying to free 4563.84 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 3635.24 MB ... [Unload] Unload model KModel [Memory Management] Current Free GPU Memory: 15135.62 MB [Memory Management] Required Model Memory: 159.87 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 13951.75 MB Moving model(s) has taken 13.07 seconds Total progress: 100%|██████████████████████████████████████████████████████████████████| 25/25 [01:17<00:00, 3.08s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 25/25 [01:17<00:00, 2.58s/it]

lllyasviel commented 3 months ago

if different then it works.

if you are worried about underlying things you can use this to test if you get same images from elsewhere with fp16 T5

image

but I do not recommend it since later Forge will try to perfectly reproduce results from Black Forest Lab. Currently no software can do that as far as I know.

WingeD123 commented 3 months ago

if different then it works.

if you are worried about underlying things you can use this to test if you get same images from elsewhere with fp16 T5

image

but I do not recommend it since later Forge will try to perfectly reproduce results from Black Forest Lab. Currently no software can do that as far as I know.

different but not as expected. once used t5fp8, i can't get t5fp16 back unless restart. sorry for the words i said which is not proper. Forge is my favorite UI and will always

lllyasviel commented 3 months ago

i can't get t5fp16 back unless restart.

I do not think that is even possible in Forge architecture. Forge never reuse module or reload weights like A1111.

Show full log as evidence for t5fp16 not getting back.

WingeD123 commented 3 months ago

i can't get t5fp16 back unless restart.

I do not think that is even possible in Forge architecture. Forge never reuse module or reload weights like A1111.

Show full log as evidence for t5fp16 not getting back.

log show no evidence, but the images generated shows, in my last case, I use t5fp16 in img1 & 3, but img2(t5 fp8) & 3 are same

WingeD123 commented 3 months ago

tried start with t5fp16 and never change it, every image generates well