precompiled Rocm binary can't generate image

brollyssj82000 commented 2 months ago

I've tried using the Rocm precompiled binary because I have a 7900xtx and the result is the image below. I also tried compiling but received a lot of errors. I also have Rocm 6.1 installed if it matters.

Can you please tell me what I'm doing wrong?

This is the command I've used in windows 10 power shell:

.\sd.exe -m .\OpenDalleV11.safetensors -o test.png --cfg-scale 7 -H 512 -W 512 --steps 40 -b 1 --type q8_0 --vae .\sdxl_vae.safetensors --prompt "A lovely cat" -v

This is the log:

Option: n_threads: 12 mode: txt2img model_path: .\OpenDalleV11.safetensors wtype: q8_0 vae_path: .\sdxl_vae.safetensors taesd_path: esrgan_path: controlnet_path: embeddings_path: stacked_id_embeddings_path: input_id_images_path: style ratio: 20.00 normzalize input image : false output_path: test.png init_img: control_image: clip on cpu: false controlnet cpu: false vae decoder on cpu:false strength(control): 0.90 prompt: A lovely cat negative_prompt: min_cfg: 1.00 cfg_scale: 7.00 clip_skip: -1 width: 512 height: 512 sample_method: euler_a schedule: default sample_steps: 40 strength(img2img): 0.75 rng: cuda seed: 42 batch_count: 1 vae_tiling: false upscale_repeats: 1 System Info: BLAS = 1 SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:147 - Using CUDA backend ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no [INFO ] stable-diffusion.cpp:171 - loading model from '.\OpenDalleV11.safetensors' [INFO ] model.cpp:737 - load .\OpenDalleV11.safetensors using safetensors format [DEBUG] model.cpp:803 - init from '.\OpenDalleV11.safetensors' [INFO ] stable-diffusion.cpp:182 - loading vae from '.\sdxl_vae.safetensors' [INFO ] model.cpp:737 - load .\sdxl_vae.safetensors using safetensors format [DEBUG] model.cpp:803 - init from '.\sdxl_vae.safetensors' [INFO ] stable-diffusion.cpp:194 - Stable Diffusion XL [INFO ] stable-diffusion.cpp:200 - Stable Diffusion weight type: q8_0 [DEBUG] stable-diffusion.cpp:201 - ggml tensor size = 400 bytes [DEBUG] clip.hpp:171 - vocab size: 49408 [DEBUG] clip.hpp:182 - trigger word img already in vocab [DEBUG] ggml_extend.hpp:990 - clip params backend buffer size = 125.22 MB(VRAM) (196 tensors) [DEBUG] ggml_extend.hpp:990 - clip params backend buffer size = 710.31 MB(VRAM) (517 tensors) [DEBUG] ggml_extend.hpp:990 - unet params backend buffer size = 2925.36 MB(VRAM) (1680 tensors) [DEBUG] ggml_extend.hpp:990 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors) [DEBUG] stable-diffusion.cpp:323 - loading weights [DEBUG] model.cpp:1389 - loading tensors from .\OpenDalleV11.safetensors [INFO ] model.cpp:1535 - unknown tensor 'cond_stage_model.logit_scale | f16 | 1 [1, 1, 1, 1, 1]' in model file [INFO ] model.cpp:1535 - unknown tensor 'cond_stage_model.text_projection | f16 | 2 [768, 768, 1, 1, 1]' in model file [DEBUG] model.cpp:1389 - loading tensors from .\sdxl_vae.safetensors [INFO ] stable-diffusion.cpp:422 - total params memory size = 3855.36MB (VRAM 3855.36MB, RAM 0.00MB): clip 835.53MB(VRAM), unet 2925.36MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM) [INFO ] stable-diffusion.cpp:426 - loading model from '.\OpenDalleV11.safetensors' completed, taking 8.01s [INFO ] stable-diffusion.cpp:446 - running in eps-prediction mode [DEBUG] stable-diffusion.cpp:481 - finished loaded file [DEBUG] stable-diffusion.cpp:1265 - txt2img 512x512 [DEBUG] stable-diffusion.cpp:1018 - prompt after extract and remove lora: "A lovely cat" [INFO ] stable-diffusion.cpp:564 - Attempting to apply 0 LoRAs [INFO ] stable-diffusion.cpp:1023 - apply_loras completed, taking 0.00s [DEBUG] conditioner.hpp:325 - parse 'A lovely cat' to [['A lovely cat', 1], ] [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] ggml_extend.hpp:941 - clip compute buffer size: 1.40 MB(VRAM) [DEBUG] ggml_extend.hpp:941 - clip compute buffer size: 2.33 MB(VRAM) [DEBUG] ggml_extend.hpp:941 - clip compute buffer size: 8.58 MB(VRAM) [DEBUG] conditioner.hpp:453 - computing condition graph completed, taking 155 ms [DEBUG] conditioner.hpp:325 - parse '' to [['', 1], ] [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] ggml_extend.hpp:941 - clip compute buffer size: 1.40 MB(VRAM) [DEBUG] ggml_extend.hpp:941 - clip compute buffer size: 2.33 MB(VRAM) [DEBUG] ggml_extend.hpp:941 - clip compute buffer size: 8.58 MB(VRAM) [DEBUG] conditioner.hpp:453 - computing condition graph completed, taking 26 ms [INFO ] stable-diffusion.cpp:1147 - get_learned_condition completed, taking 185 ms [INFO ] stable-diffusion.cpp:1168 - sampling using Euler A method [INFO ] stable-diffusion.cpp:1172 - generating image: 1/1 - seed 42 [DEBUG] ggml_extend.hpp:941 - unet compute buffer size: 132.05 MB(VRAM) |==================================================| 40/40 - 9.33it/s [INFO ] stable-diffusion.cpp:1203 - sampling completed, taking 4.77s [INFO ] stable-diffusion.cpp:1211 - generating 1 latent images completed, taking 4.79s [INFO ] stable-diffusion.cpp:1214 - decoding 1 latents [DEBUG] ggml_extend.hpp:941 - vae compute buffer size: 1664.00 MB(VRAM) [DEBUG] stable-diffusion.cpp:888 - computing vae [mode: DECODE] graph completed, taking 0.60s [INFO ] stable-diffusion.cpp:1224 - latent 1 decoded, taking 0.60s [INFO ] stable-diffusion.cpp:1228 - decode_first_stage completed, taking 0.60s [INFO ] stable-diffusion.cpp:1328 - txt2img completed in 5.58s save result image to 'test.png'

This is the generated image:

test

offbeat-stuff commented 2 months ago

Try a standard command to test if it really doesn't work.

One from the examples would be ok.

offbeat-stuff commented 2 months ago

You can also try running https://github.com/leejet/stable-diffusion.cpp/pull/291 This depends on the vulkan drivers

brollyssj82000 commented 2 months ago

I tried using a standard command from the examples and the result is the same.

leejet / stable-diffusion.cpp

precompiled Rocm binary can't generate image #348