taotaow commented 3 months ago

[ERROR] stable-diffusion.cpp:173 - init model loader from file failed: '.\sd3_medium_incl_clips_t5xxlfp8.safetensors'

safetensors from https://huggingface.co/adamo1139/stable-diffusion-3-medium-ungated/

rhjdvsgsgks commented 3 months ago

the full error is

[DEBUG] model.cpp:807  - init from 'sd3_medium_incl_clips_t5xxlfp8.safetensors'
[ERROR] model.cpp:873  - unsupported dtype 'F8_E4M3'
[ERROR] stable-diffusion.cpp:182  - init model loader from file failed: 'sd3_medium_incl_clips_t5xxlfp8.safetensors'

red-scorp commented 3 months ago

+1 to this topic. I see FP8 models more and more, especially with introduction of FLUX monster models. I hope with SD.cpp it will be possible to run new model more comfortable on old HW.

SkutteOleg commented 3 months ago

359 seems to have added this functionality?

Green-Sky commented 3 months ago

I totally forgot to check open issues, so I guess others had this issue before me :). I did not test f8 for t5xxl specifically, but it should work, but keep in mind that it is upcasting to f16, so you should convert the model down to something like q8_0.

Green-Sky commented 3 months ago

It got merged into master, go ahead and try it.

taotaow commented 3 months ago

@Green-Sky thank you,It is ok. [DEBUG] stable-diffusion.cpp:169 - Using CPU backend [INFO ] stable-diffusion.cpp:184 - loading model from '.\sd3_medium_incl_clips_t5xxlfp8.safetensors' [INFO ] model.cpp:789 - load .\sd3_medium_incl_clips_t5xxlfp8.safetensors using safetensors format [DEBUG] model.cpp:857 - init from '.\sd3_medium_incl_clips_t5xxlfp8.safetensors' [INFO ] stable-diffusion.cpp:224 - Version: SD3 2B [INFO ] stable-diffusion.cpp:255 - Weight type: f16 [INFO ] stable-diffusion.cpp:256 - Conditioner weight type: f16 [INFO ] stable-diffusion.cpp:257 - Diffsuion model weight type: f16 [INFO ] stable-diffusion.cpp:258 - VAE weight type: f16 [DEBUG] stable-diffusion.cpp:260 - ggml tensor size = 400 bytes [DEBUG] clip.hpp:171 - vocab size: 49408 [DEBUG] clip.hpp:182 - trigger word img already in vocab [DEBUG] clip.hpp:171 - vocab size: 49408 [DEBUG] clip.hpp:182 - trigger word img already in vocab [DEBUG] ggml_extend.hpp:1029 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors) [DEBUG] ggml_extend.hpp:1029 - clip params backend buffer size = 1329.29 MB(RAM) (517 tensors) [DEBUG] ggml_extend.hpp:1029 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors) [DEBUG] ggml_extend.hpp:1029 - mmdit params backend buffer size = 4114.77 MB(RAM) (491 tensors) [DEBUG] ggml_extend.hpp:1029 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors) [DEBUG] stable-diffusion.cpp:387 - loading weights [DEBUG] model.cpp:1526 - loading tensors from .\sd3_medium_incl_clips_t5xxlfp8.safetensors [INFO ] model.cpp:1681 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f8_e4m3 | 2 [4096, 32128, 1, 1, 1]' in model file [INFO ] stable-diffusion.cpp:486 - total params memory size = 14857.47MB (VRAM 0.00MB, RAM 14857.47MB): clip 10648.13MB(RAM), unet 4114.77MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM) [INFO ] stable-diffusion.cpp:490 - loading model from '.\sd3_medium_incl_clips_t5xxlfp8.safetensors' completed, taking 129.76s [INFO ] stable-diffusion.cpp:504 - running in FLOW mode [DEBUG] stable-diffusion.cpp:552 - finished loaded file [DEBUG] stable-diffusion.cpp:1358 - txt2img 1024x1024 [DEBUG] stable-diffusion.cpp:1107 - prompt after extract and remove lora: "a lovely cat holding a sign says "Stable diffusion 3"" [INFO ] stable-diffusion.cpp:635 - Attempting to apply 0 LoRAs [INFO ] stable-diffusion.cpp:1112 - apply_loras completed, taking 0.00s [DEBUG] conditioner.hpp:687 - parse 'a lovely cat holding a sign says "Stable diffusion 3"' to [['a lovely cat holding a sign says "Stable diffusion 3"', 1], ] [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] t5.hpp:397 - token length: 77 [DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 1.40 MB(RAM) [DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 2.33 MB(RAM) [DEBUG] ggml_extend.hpp:980 - t5 compute buffer size: 11.94 MB(RAM) [DEBUG] conditioner.hpp:930 - computing condition graph completed, taking 33220 ms [DEBUG] conditioner.hpp:687 - parse '' to [['', 1], ] [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] t5.hpp:397 - token length: 77 [DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 1.40 MB(RAM) [DEBUG] ggml_extend.hpp:980 - clip compute buffer size: 2.33 MB(RAM) [DEBUG] ggml_extend.hpp:980 - t5 compute buffer size: 11.94 MB(RAM) [DEBUG] conditioner.hpp:930 - computing condition graph completed, taking 9685 ms [INFO ] stable-diffusion.cpp:1236 - get_learned_condition completed, taking 42985 ms [INFO ] stable-diffusion.cpp:1259 - sampling using Euler method [INFO ] stable-diffusion.cpp:1263 - generating image: 1/1 - seed 42 [DEBUG] ggml_extend.hpp:980 - mmdit compute buffer size: 1784.58 MB(RAM) |==================================================| 20/20 - 240.22s/it [INFO ] stable-diffusion.cpp:1295 - sampling completed, taking 4743.67s [INFO ] stable-diffusion.cpp:1303 - generating 1 latent images completed, taking 4744.97s [INFO ] stable-diffusion.cpp:1306 - decoding 1 latents [DEBUG] ggml_extend.hpp:980 - vae compute buffer size: 6656.00 MB(RAM) [DEBUG] stable-diffusion.cpp:967 - computing vae [mode: DECODE] graph completed, taking 122.87s [INFO ] stable-diffusion.cpp:1316 - latent 1 decoded, taking 122.87s [INFO ] stable-diffusion.cpp:1320 - decode_first_stage completed, taking 122.87s [INFO ] stable-diffusion.cpp:1429 - txt2img completed in 4910.89s save result image to 'output.png'

taotaow commented 3 months ago

[INFO ] model.cpp:1681 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f8_e4m3 | 2 [4096, 32128, 1, 1, 1]' in model file

maybe this unknown tensor is not perfect

Green-Sky commented 3 months ago

[INFO ] model.cpp:1681 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f8_e4m3 | 2 [4096, 32128, 1, 1, 1]' in model file

maybe this unknown tensor is not perfect

I have seen this with flux too, but it works regardless.

keep in mind that it upconverts f8_e4m3 in place to f16, so you might want to experiment to convert it to q8_0. https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/quantization_and_gguf.md

leejet / stable-diffusion.cpp

support sd3 fp8 safetensors #329

359 seems to have added this functionality?