Open taotaow opened 3 months ago
the full error is
[DEBUG] model.cpp:807 - init from 'sd3_medium_incl_clips_t5xxlfp8.safetensors'
[ERROR] model.cpp:873 - unsupported dtype 'F8_E4M3'
[ERROR] stable-diffusion.cpp:182 - init model loader from file failed: 'sd3_medium_incl_clips_t5xxlfp8.safetensors'
+1 to this topic. I see FP8 models more and more, especially with introduction of FLUX monster models. I hope with SD.cpp it will be possible to run new model more comfortable on old HW.
I totally forgot to check open issues, so I guess others had this issue before me :).
I did not test f8
for t5xxl specifically, but it should work, but keep in mind that it is upcasting to f16
, so you should convert the model down to something like q8_0
.
It got merged into master, go ahead and try it.
@Green-Sky thank you,It is ok. [DEBUG] stable-diffusion.cpp:169 - Using CPU backend [INFO ] stable-diffusion.cpp:184 - loading model from '.\sd3_medium_incl_clips_t5xxlfp8.safetensors' [INFO ] model.cpp:789 - load .\sd3_medium_incl_clips_t5xxlfp8.safetensors using safetensors format [DEBUG] model.cpp:857 - init from '.\sd3_medium_incl_clips_t5xxlfp8.safetensors' [INFO ] stable-diffusion.cpp:224 - Version: SD3 2B [INFO ] stable-diffusion.cpp:255 - Weight type: f16 [INFO ] stable-diffusion.cpp:256 - Conditioner weight type: f16 [INFO ] stable-diffusion.cpp:257 - Diffsuion model weight type: f16 [INFO ] stable-diffusion.cpp:258 - VAE weight type: f16 [DEBUG] stable-diffusion.cpp:260 - ggml tensor size = 400 bytes [DEBUG] clip.hpp:171 - vocab size: 49408 [DEBUG] clip.hpp:182 - trigger word img already in vocab [DEBUG] clip.hpp:171 - vocab size: 49408 [DEBUG] clip.hpp:182 - trigger word img already in vocab [DEBUG] ggml_extend.hpp:1029 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors) [DEBUG] ggml_extend.hpp:1029 - clip params backend buffer size = 1329.29 MB(RAM) (517 tensors) [DEBUG] ggml_extend.hpp:1029 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors) [DEBUG] ggml_extend.hpp:1029 - mmdit params backend buffer size = 4114.77 MB(RAM) (491 tensors) [DEBUG] ggml_extend.hpp:1029 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors) [DEBUG] stable-diffusion.cpp:387 - loading weights [DEBUG] model.cpp:1526 - loading tensors from .\sd3_medium_incl_clips_t5xxlfp8.safetensors [INFO ] model.cpp:1681 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f8_e4m3 | 2 [4096, 32128, 1, 1, 1]' in model file [INFO ] stable-diffusion.cpp:486 - total params memory size = 14857.47MB (VRAM 0.00MB, RAM 14857.47MB): clip 10648.13MB(RAM), unet 4114.77MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM) [INFO ] stable-diffusion.cpp:490 - loading model from '.\sd3_medium_incl_clips_t5xxlfp8.safetensors' completed, taking 129.76s [INFO ] stable-diffusion.cpp:504 - running in FLOW mode [DEBUG] stable-diffusion.cpp:552 - finished loaded file [DEBUG] stable-diffusion.cpp:1358 - txt2img 1024x1024 [DEBUG] stable-diffusion.cpp:1107 - prompt after extract and remove lora: "a lovely cat holding a sign says "Stable diffusion 3"" [INFO ] stable-diffusion.cpp:635 - Attempting to apply 0 LoRAs [INFO ] stable-diffusion.cpp:1112 - apply_loras completed, taking 0.00s [DEBUG] conditioner.hpp:687 - parse 'a lovely cat holding a sign says "Stable diffusion 3"' to [['a lovely cat holding a sign says "Stable diffusion 3"', 1], ] [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] t5.hpp:397 - token length: 77 [DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 1.40 MB(RAM) [DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 2.33 MB(RAM) [DEBUG] ggml_extend.hpp:980 - t5 compute buffer size: 11.94 MB(RAM) [DEBUG] conditioner.hpp:930 - computing condition graph completed, taking 33220 ms [DEBUG] conditioner.hpp:687 - parse '' to [['', 1], ] [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] t5.hpp:397 - token length: 77 [DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 1.40 MB(RAM) [DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 2.33 MB(RAM) [DEBUG] ggml_extend.hpp:980 - t5 compute buffer size: 11.94 MB(RAM) [DEBUG] conditioner.hpp:930 - computing condition graph completed, taking 9685 ms [INFO ] stable-diffusion.cpp:1236 - get_learned_condition completed, taking 42985 ms [INFO ] stable-diffusion.cpp:1259 - sampling using Euler method [INFO ] stable-diffusion.cpp:1263 - generating image: 1/1 - seed 42 [DEBUG] ggml_extend.hpp:980 - mmdit compute buffer size: 1784.58 MB(RAM) |==================================================| 20/20 - 240.22s/it [INFO ] stable-diffusion.cpp:1295 - sampling completed, taking 4743.67s [INFO ] stable-diffusion.cpp:1303 - generating 1 latent images completed, taking 4744.97s [INFO ] stable-diffusion.cpp:1306 - decoding 1 latents [DEBUG] ggml_extend.hpp:980 - vae compute buffer size: 6656.00 MB(RAM) [DEBUG] stable-diffusion.cpp:967 - computing vae [mode: DECODE] graph completed, taking 122.87s [INFO ] stable-diffusion.cpp:1316 - latent 1 decoded, taking 122.87s [INFO ] stable-diffusion.cpp:1320 - decode_first_stage completed, taking 122.87s [INFO ] stable-diffusion.cpp:1429 - txt2img completed in 4910.89s save result image to 'output.png'
[INFO ] model.cpp:1681 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f8_e4m3 | 2 [4096, 32128, 1, 1, 1]' in model file
maybe this unknown tensor is not perfect
[INFO ] model.cpp:1681 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f8_e4m3 | 2 [4096, 32128, 1, 1, 1]' in model file
maybe this unknown tensor is not perfect
I have seen this with flux too, but it works regardless.
keep in mind that it upconverts f8_e4m3
in place to f16
, so you might want to experiment to convert it to q8_0
. https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/quantization_and_gguf.md
[ERROR] stable-diffusion.cpp:173 - init model loader from file failed: '.\sd3_medium_incl_clips_t5xxlfp8.safetensors'
safetensors from https://huggingface.co/adamo1139/stable-diffusion-3-medium-ungated/