I'm running an 8 bit quantized version of SDXL turbo on a 3060 Laptop GPU, and the txt2img part itself takes around 2.5s, opening the model takes ~25s. I want to be able to generate multiple images with a same prompt so I did the following:
import os
from tqdm import tqdm
for i in tqdm(range(16)):
os.system(f"./bin/sd -m ../models/sd_xl_turbo_1.0.q8_0.gguf --vae ../models/sdxl_vae.safetensors -s -1 -p 'a cute cat' --cfg-scale 1.0 --steps 4 -o pics/output_{i}.png")
I noticed in the logs, that for every iteration, the model was being re-opened. Is there a way to already load in the model once and generate multiple images sequentially?
Logs:
0%| | 0/16 [00:00<?, ?it/s]ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060 Laptop GPU, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:169 - loading model from '../models/sd_xl_turbo_1.0.q8_0.gguf'
[INFO ] model.cpp:732 - load ../models/sd_xl_turbo_1.0.q8_0.gguf using gguf format
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_malloc!
[INFO ] stable-diffusion.cpp:180 - loading vae from '../models/sdxl_vae.safetensors'
[INFO ] model.cpp:735 - load ../models/sdxl_vae.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:192 - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:198 - Stable Diffusion weight type: q8_0
[INFO ] stable-diffusion.cpp:404 - total params memory size = 3855.36MB (VRAM 3855.36MB, RAM 0.00MB): clip 835.53MB(VRAM), unet 2925.36MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:423 - loading model from '../models/sd_xl_turbo_1.0.q8_0.gguf' completed, taking 28.44s
[INFO ] stable-diffusion.cpp:440 - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:556 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1585 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1698 - get_learned_condition completed, taking 368 ms
[INFO ] stable-diffusion.cpp:1716 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1720 - generating image: 1/1 - seed 1534224841
|==================================================| 4/4 - 3.30it/s
[INFO ] stable-diffusion.cpp:1763 - sampling completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1771 - generating 1 latent images completed, taking 1.26s
[INFO ] stable-diffusion.cpp:1774 - decoding 1 latents
6%|▋ | 1/16 [00:33<08:16, 33.11s/it]
[INFO ] stable-diffusion.cpp:1784 - latent 1 decoded, taking 0.85s
[INFO ] stable-diffusion.cpp:1788 - decode_first_stage completed, taking 0.85s
[INFO ] stable-diffusion.cpp:1872 - txt2img completed in 2.48s
save result image to 'pics/output_0.png'
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060 Laptop GPU, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:169 - loading model from '../models/sd_xl_turbo_1.0.q8_0.gguf'
[INFO ] model.cpp:732 - load ../models/sd_xl_turbo_1.0.q8_0.gguf using gguf format
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_malloc!
[INFO ] stable-diffusion.cpp:180 - loading vae from '../models/sdxl_vae.safetensors'
[INFO ] model.cpp:735 - load ../models/sdxl_vae.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:192 - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:198 - Stable Diffusion weight type: q8_0
[INFO ] stable-diffusion.cpp:404 - total params memory size = 3855.36MB (VRAM 3855.36MB, RAM 0.00MB): clip 835.53MB(VRAM), unet 2925.36MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:423 - loading model from '../models/sd_xl_turbo_1.0.q8_0.gguf' completed, taking 28.01s
I'm running an 8 bit quantized version of SDXL turbo on a 3060 Laptop GPU, and the txt2img part itself takes around 2.5s, opening the model takes ~25s. I want to be able to generate multiple images with a same prompt so I did the following:
I noticed in the logs, that for every iteration, the model was being re-opened. Is there a way to already load in the model once and generate multiple images sequentially? Logs: