Open shibingli opened 1 week ago
The solid black result is often related to the VAE and size, box is built on top of stable-diffusion.cpp, so I will try your case later.
BTW, could you try the upload artifact of this action result: https://github.com/gpustack/llama-box/actions/runs/11832443367/job/32969277953?
We also provide an all-in-one GGUF for SD 3.5 Large https://huggingface.co/gpustack/stable-diffusion-v3-5-large-GGUF.
Here is the result of our testing env.
FYI, the log is as follows.
0.00.072.965 I ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
0.00.072.967 I ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
0.00.072.967 I ggml_cuda_init: found 1 CUDA devices:
0.00.074.707 I Device 0: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes
0.00.077.139 I
0.00.077.142 I version: main (a29fe58)
0.00.077.142 I compiler: cc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
0.00.077.142 I target: x86_64-linux-gnu
0.00.077.142 I vendor:
0.00.077.143 I - llama.cpp 66798e42 (376)
0.00.077.143 I - stable-diffusion.cpp b5b57e9 (173)
0.00.078.333 I system_info: n_threads = 6 (n_threads_batch = 6) / 20 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
0.00.078.335 I
0.00.078.432 I srv main: listening, hostname = 127.0.0.1, port = 8080, n_threads = 3 + 2
0.00.079.564 I srv main: loading model
0.00.079.598 I load_from_file: loading model from '/home/frank/gpustack/stable-diffusion-v3-5-large-GGUF/stable-diffusion-v3-5-large-Q8_0.gguf'
0.00.079.613 I init_from_file: load /home/frank/gpustack/stable-diffusion-v3-5-large-GGUF/stable-diffusion-v3-5-large-Q8_0.gguf using gguf format
0.00.094.295 I load_from_file: Version: SD3.5 Large
0.00.097.853 I load_from_file: Weight type: q8_0
0.00.097.854 I load_from_file: Conditioner weight type: q8_0
0.00.097.854 I load_from_file: Diffusion model weight type: q8_0
0.00.097.854 I load_from_file: VAE weight type: f16
0.03.696.669 I load_from_file: total params memory size = 14412.42MB (VRAM 14412.42MB, RAM 0.00MB): clip 5661.64MB(VRAM), unet 8590.78MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
0.03.696.673 I load_from_file: loading model from '/home/frank/gpustack/stable-diffusion-v3-5-large-GGUF/stable-diffusion-v3-5-large-Q8_0.gguf' completed, taking 2.97s
0.03.696.674 I load_from_file: running in FLOW mode
0.03.696.781 I srv load_model: sampler: euler, steps: 20, cfg scale: 4.50
0.03.696.782 I srv main: initializing server
0.03.696.783 I srv init: initializing slots, n_slots = 1
0.03.696.789 I slot init: id 0 | task -1 | new slot n_ctx_slot = 1112406280
0.03.696.875 I srv main: starting server
0.08.235.255 I srv oaicompat_images_generati: params: {"n":1,"prompt":"...","quality":"standard","response_format":"b64_json","size":"512x512"}
0.08.235.438 I slot launch_slot_with_task: id 0 | task 0 | processing task
0.08.235.853 I apply_loras: Attempting to apply 0 LoRAs
0.08.235.856 I generate_image: apply_loras completed, taking 0.00s
0.08.385.031 I generate_image: get_learned_condition completed, taking 149 ms
0.08.385.032 I generate_image: sampling using Euler method
0.08.385.033 I generate_image: generating image: 1/1 - seed 4294967295
0.08.832.152 I generate_image: sampling 001/020 - 0.34s/it
0.09.132.441 I generate_image: sampling 002/020 - 0.30s/it
0.09.432.522 I generate_image: sampling 003/020 - 0.30s/it
0.09.732.681 I generate_image: sampling 004/020 - 0.30s/it
0.10.032.784 I generate_image: sampling 005/020 - 0.30s/it
0.10.332.987 I generate_image: sampling 006/020 - 0.30s/it
0.10.633.147 I generate_image: sampling 007/020 - 0.30s/it
0.10.933.553 I generate_image: sampling 008/020 - 0.30s/it
0.11.233.649 I generate_image: sampling 009/020 - 0.30s/it
0.11.533.804 I generate_image: sampling 010/020 - 0.30s/it
0.11.834.027 I generate_image: sampling 011/020 - 0.30s/it
0.12.134.123 I generate_image: sampling 012/020 - 0.30s/it
0.12.434.221 I generate_image: sampling 013/020 - 0.30s/it
0.12.734.299 I generate_image: sampling 014/020 - 0.30s/it
0.13.034.417 I generate_image: sampling 015/020 - 0.30s/it
0.13.334.533 I generate_image: sampling 016/020 - 0.30s/it
0.13.634.596 I generate_image: sampling 017/020 - 0.30s/it
0.13.934.691 I generate_image: sampling 018/020 - 0.30s/it
0.14.234.861 I generate_image: sampling 019/020 - 0.30s/it
0.14.534.980 I generate_image: sampling 020/020 - 0.30s/it
0.14.535.366 I generate_image: sampling completed, taking 6.15s
0.14.535.368 I generate_image: generating 1 latent images completed, taking 6.15s
0.14.535.369 I generate_image: decoding 1 latents
0.14.727.549 I generate_image: latent 1 decoded, taking 0.19s
0.14.727.550 I generate_image: decode_first_stage completed, taking 0.19s
0.14.731.077 I txt2img: txt2img completed in 6.49s
0.14.830.609 I slot release: id 0 | task 0 | stop processing: n_past = 0, truncated = 0
0.14.830.720 I srv log_server_request: request 200: POST /v1/images/generations 127.0.0.1:36270
0.52.385.462 I srv oaicompat_images_generati: params: {"n":1,"prompt":"...","quality":"standard","response_format":"b64_json","size":"1024x1024"}
0.52.385.573 I slot launch_slot_with_task: id 0 | task 33561 | processing task
0.52.386.502 I apply_loras: Attempting to apply 0 LoRAs
0.52.386.504 I generate_image: apply_loras completed, taking 0.00s
0.52.508.524 I generate_image: get_learned_condition completed, taking 122 ms
0.52.508.525 I generate_image: sampling using Euler method
0.52.508.526 I generate_image: generating image: 1/1 - seed 4294967295
0.54.899.931 I generate_image: sampling 001/020 - 1.94s/it
0.56.839.092 I generate_image: sampling 002/020 - 1.94s/it
0.58.776.527 I generate_image: sampling 003/020 - 1.94s/it
1.00.714.234 I generate_image: sampling 004/020 - 1.94s/it
1.02.651.816 I generate_image: sampling 005/020 - 1.94s/it
1.04.589.630 I generate_image: sampling 006/020 - 1.94s/it
1.06.529.040 I generate_image: sampling 007/020 - 1.94s/it
1.08.468.232 I generate_image: sampling 008/020 - 1.94s/it
1.10.407.831 I generate_image: sampling 009/020 - 1.94s/it
1.12.347.398 I generate_image: sampling 010/020 - 1.94s/it
1.14.287.146 I generate_image: sampling 011/020 - 1.94s/it
1.16.226.729 I generate_image: sampling 012/020 - 1.94s/it
1.18.166.256 I generate_image: sampling 013/020 - 1.94s/it
1.20.106.069 I generate_image: sampling 014/020 - 1.94s/it
1.22.046.248 I generate_image: sampling 015/020 - 1.94s/it
1.23.986.123 I generate_image: sampling 016/020 - 1.94s/it
1.25.925.880 I generate_image: sampling 017/020 - 1.94s/it
1.27.865.817 I generate_image: sampling 018/020 - 1.94s/it
1.29.806.029 I generate_image: sampling 019/020 - 1.94s/it
1.31.745.792 I generate_image: sampling 020/020 - 1.94s/it
1.31.747.293 I generate_image: sampling completed, taking 39.24s
1.31.747.296 I generate_image: generating 1 latent images completed, taking 39.24s
1.31.747.296 I generate_image: decoding 1 latents
1.32.624.759 I generate_image: latent 1 decoded, taking 0.88s
1.32.624.760 I generate_image: decode_first_stage completed, taking 0.88s
1.32.638.224 I txt2img: txt2img completed in 40.25s
1.33.037.808 I slot release: id 0 | task 33561 | stop processing: n_past = 0, truncated = 0
1.33.038.267 I srv log_server_request: request 200: POST /v1/images/generations 127.0.0.1:45150
@thxCode Thank you very much. I am attempting to update the program and then download the integrated SD model you provided. I will give feedback on the results once it's completed.
@shibingli , I found that our current releases haven't use CUDA correctly yet, I will release a new version to support this. I will ping you later. thanks.
@shibingli please give a shot with v0.0.78.
@thxCode Great! Using the specified latest version of the program and the integrated model you provided has successfully generated the image!
./llama-box --host 127.0.0.1 --port 29353 --threads 62 --parallel 4 -m /data/llm_models/gpustack/stable-diffusion-v3-5-large-GGUF/stable-diffusion-v3-5-large-Q4_0.gguf --alias sd3.5-large --images --image-no-text-encoder-model-offload --image-no-vae-model-offload --image-max-height 1792 --image-max-width 1792 --image-cfg-scale 4.50 --image-max-batch 4 --image-sampler euler
{
"model": "sd3.5-large",
"prompt": "fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details, 4k resolution",
"n": 1,
"response_format": "b64_json",
"size": "512x512"
}
BTW: However, using the official SD3.5 model to generate GGUF still fails to produce an image, resulting in a black screen.
@shibingli can you detail the process of using the official SD3.5 model to generate GGUF still fails to produce an image, resulting in a black screen
? I am curious about the approach you tried. box is based on stable-diffusion.cpp, I hope we can keep a similar experience to it.
@shibingli can you detail the process of
using the official SD3.5 model to generate GGUF still fails to produce an image, resulting in a black screen
? I am curious about the approach you tried. box is based on stable-diffusion.cpp, I hope we can keep a similar experience to it.
1、Clone stable-diffusion.cpp and compile it.
$ git clone --recurse-submodules https://github.com/leejet/stable-diffusion.cpp
正克隆到 'stable-diffusion.cpp'...
remote: Enumerating objects: 1153, done.
remote: Counting objects: 100% (301/301), done.
remote: Compressing objects: 100% (96/96), done.
remote: Total 1153 (delta 251), reused 212 (delta 205), pack-reused 852 (from 1)
接收对象中: 100% (1153/1153), 21.79 MiB | 2.06 MiB/s, 完成.
处理 delta 中: 100% (699/699), 完成.
子模组 'ggml'(https://github.com/ggerganov/ggml.git)已对路径 'ggml' 注册
正克隆到 '/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp/ggml'...
remote: Enumerating objects: 11833, done.
remote: Counting objects: 100% (5108/5108), done.
remote: Compressing objects: 100% (1121/1121), done.
remote: Total 11833 (delta 4293), reused 4254 (delta 3961), pack-reused 6725 (from 1)
接收对象中: 100% (11833/11833), 11.30 MiB | 13.98 MiB/s, 完成.
处理 delta 中: 100% (8087/8087), 完成.
子模组路径 'ggml':检出 '21d3a308fcb7f31cb9beceaeebad4fb622f3c337'
$ cd stable-diffusion.cpp
$ cmake -B build -DSD_CUBLAS=ON
-- The C compiler identification is GNU 13.2.0
-- The CXX compiler identification is GNU 13.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Use CUBLAS as backend stable-diffusion
-- Build static library
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- OpenMP found
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.6.77")
-- CUDA found
-- Using CUDA architectures: 52;61;70;75
-- The CUDA compiler identification is NVIDIA 12.6.77
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- CUDA host compiler is GNU 13.2.0
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (5.4s)
-- Generating done (0.0s)
-- Build files have been written to: /opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/build
$ cmake --build build --config Release -j 128
[ 1%] Building C object thirdparty/CMakeFiles/zip.dir/zip.c.o
[ 2%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[ 5%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-backend.c.o
[ 5%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 7%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-quants.c.o
[ 7%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/acc.cu.o
[ 8%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/binbcast.cu.o
[ 11%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/argsort.cu.o
[ 11%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/arange.cu.o
[ 12%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/clamp.cu.o
[ 14%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/concat.cu.o
[ 15%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/conv-transpose-1d.cu.o
[ 16%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/convert.cu.o
[ 19%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/cross-entropy-loss.cu.o
[ 19%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/cpy.cu.o
[ 20%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/diagmask.cu.o
[ 21%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/dmmv.cu.o
[ 24%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/fattn-tile-f32.cu.o
[ 24%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/fattn.cu.o
[ 26%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/fattn-tile-f16.cu.o
[ 26%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/getrows.cu.o
In file included from /opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/thirdparty/zip.c:40:
/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/thirdparty/miniz.h:4988:9: note: ‘#pragma message: Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.’
4988 | #pragma message( \
| ^~~~~~~
[ 28%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/im2col.cu.o
[ 29%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/mmq.cu.o
[ 30%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/mmvq.cu.o
[ 32%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/norm.cu.o
[ 33%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/pad.cu.o
[ 33%] Built target zip
[ 34%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/pool2d.cu.o
[ 35%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/quantize.cu.o
[ 37%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/rope.cu.o
[ 38%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/scale.cu.o
[ 39%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/softmax.cu.o
[ 41%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/sumrows.cu.o
[ 42%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/tsembd.cu.o
[ 43%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/unary.cu.o
[ 44%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/upscale.cu.o
[ 47%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb16.cu.o
[ 47%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda.cu.o
[ 48%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb32.cu.o
[ 50%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb16.cu.o
[ 51%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb32.cu.o
[ 52%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb8.cu.o
[ 55%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq1_s.cu.o
[ 55%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq2_xs.cu.o
[ 56%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq2_s.cu.o
[ 57%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq2_xxs.cu.o
[ 58%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq3_s.cu.o
[ 60%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq3_xxs.cu.o
[ 61%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq4_nl.cu.o
[ 62%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq4_xs.cu.o
[ 64%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q2_k.cu.o
[ 65%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q3_k.cu.o
/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2436): warning #177-D: function "set_ggml_graph_node_properties" was declared but never referenced
static void set_ggml_graph_node_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2448): warning #177-D: function "ggml_graph_node_has_matching_properties" was declared but never referenced
static bool ggml_graph_node_has_matching_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
^
/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2436): warning #177-D: function "set_ggml_graph_node_properties" was declared but never referenced
static void set_ggml_graph_node_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2448): warning #177-D: function "ggml_graph_node_has_matching_properties" was declared but never referenced
static bool ggml_graph_node_has_matching_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
^
/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2436): warning #177-D: function "set_ggml_graph_node_properties" was declared but never referenced
static void set_ggml_graph_node_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2448): warning #177-D: function "ggml_graph_node_has_matching_properties" was declared but never referenced
static bool ggml_graph_node_has_matching_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
^
/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2436): warning #177-D: function "set_ggml_graph_node_properties" was declared but never referenced
static void set_ggml_graph_node_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2448): warning #177-D: function "ggml_graph_node_has_matching_properties" was declared but never referenced
static bool ggml_graph_node_has_matching_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
^
[ 66%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q4_0.cu.o
[ 67%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q4_1.cu.o
[ 69%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q4_k.cu.o
[ 70%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q5_0.cu.o
[ 71%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q5_1.cu.o
[ 73%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q5_k.cu.o
[ 74%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q6_k.cu.o
[ 75%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q8_0.cu.o
[ 76%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q4_0-q4_0.cu.o
[ 79%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q8_0-q8_0.cu.o
[ 79%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q8_0-q8_0.cu.o
[ 80%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q4_0-q4_0.cu.o
[ 83%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-f16-f16.cu.o
[ 84%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs64-f16-f16.cu.o
[ 84%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs256-f16-f16.cu.o
[ 85%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-f16-f16.cu.o
[ 87%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs64-f16-f16.cu.o
[ 88%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs256-f16-f16.cu.o
[ 89%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-aarch64.c.o
[ 91%] Linking CUDA static library libggml.a
[ 91%] Built target ggml
[ 92%] Building CXX object CMakeFiles/stable-diffusion.dir/model.cpp.o
[ 93%] Building CXX object CMakeFiles/stable-diffusion.dir/stable-diffusion.cpp.o
[ 94%] Building CXX object CMakeFiles/stable-diffusion.dir/util.cpp.o
[ 96%] Building CXX object CMakeFiles/stable-diffusion.dir/upscaler.cpp.o
[ 97%] Linking CXX static library libstable-diffusion.a
[ 97%] Built target stable-diffusion
[ 98%] Building CXX object examples/cli/CMakeFiles/sd.dir/main.cpp.o
[100%] Linking CXX executable ../../bin/sd
[100%] Built target sd
$ cd build/bin
$ ls
sd
2、Generate a gguf model.
$ ./sd -M convert -m /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors -o /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf -v --type q4_1 && ./sd -M convert -m /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors -o /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q8_0.gguf -v --type q8_0
Option:
n_threads: 64
mode: convert
model_path: /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors
wtype: q4_1
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path:
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf
init_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:false
strength(control): 0.90
prompt:
negative_prompt:
min_cfg: 1.00
cfg_scale: 7.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: euler_a
schedule: default
sample_steps: 20
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
BLAS = 1
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 1
AVX512_VNNI = 1
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[INFO ] model.cpp:804 - load /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors using safetensors format
[DEBUG] model.cpp:872 - init from '/data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors'
[INFO ] model.cpp:1794 - model tensors mem size: 5349.79MB
[DEBUG] model.cpp:1548 - loading tensors from /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors
[INFO ] model.cpp:1829 - load tensors done
[INFO ] model.cpp:1830 - trying to save tensors to /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf
convert '/data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors'/'' to '/data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf' success
Option:
n_threads: 64
mode: convert
model_path: /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors
wtype: q8_0
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path:
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q8_0.gguf
init_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:false
strength(control): 0.90
prompt:
negative_prompt:
min_cfg: 1.00
cfg_scale: 7.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: euler_a
schedule: default
sample_steps: 20
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
BLAS = 1
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 1
AVX512_VNNI = 1
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[INFO ] model.cpp:804 - load /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors using safetensors format
[DEBUG] model.cpp:872 - init from '/data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors'
[INFO ] model.cpp:1794 - model tensors mem size: 8693.64MB
[DEBUG] model.cpp:1548 - loading tensors from /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors
[INFO ] model.cpp:1829 - load tensors done
[INFO ] model.cpp:1830 - trying to save tensors to /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q8_0.gguf
convert '/data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors'/'' to '/data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q8_0.gguf' success
$ cd /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF
$ ls
sd3.5_large-q4_1.gguf sd3.5_large-q8_0.gguf
3、Run the model using llama-box.
$ ./llama-box --host 127.0.0.1 --port 7219 --threads 62 --parallel 4 -m /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf --alias sd3.5-large --images --image-max-batch 4 --image-t5xxl-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors --image-clip-l-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors --image-clip-g-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors
0.00.137.105 I ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
0.00.137.109 I ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
0.00.137.110 I ggml_cuda_init: found 4 CUDA devices:
0.00.139.807 I Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
0.00.143.877 I Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
0.00.165.996 I Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
0.00.168.633 I Device 3: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
0.00.649.635 I
0.00.649.640 I version: dev (457528f)
0.00.649.641 I compiler: cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0
0.00.649.641 I target: x86_64-linux-gnu
0.00.649.641 I vendor:
0.00.649.642 I - llama.cpp 4a8ccb37 (392)
0.00.649.642 I - stable-diffusion.cpp 4e75394 (181)
0.00.650.870 I system_info: n_threads = 62 (n_threads_batch = 62) / 128 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
0.00.650.872 I
0.00.650.921 I srv main: listening, hostname = 127.0.0.1, port = 7219, n_threads = 6 + 2
0.00.652.194 I srv main: loading model
0.00.652.229 I load_from_file: loading model from '/data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf'
0.00.652.263 I init_from_file: load /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf using gguf format
0.00.659.035 I load_from_file: loading clip_l from '/data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors'
0.00.659.435 I init_from_file: load /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors using safetensors format
0.00.660.086 I load_from_file: loading clip_g from '/data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors'
0.00.661.027 I init_from_file: load /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors using safetensors format
0.00.662.695 I load_from_file: loading t5xxl from '/data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors'
0.00.663.057 I init_from_file: load /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors using safetensors format
0.00.663.979 I load_from_file: Version: SD3.5 Large
0.00.665.345 I load_from_file: Weight type: q4_1
0.00.665.348 I load_from_file: Conditioner weight type: f16
0.00.665.348 I load_from_file: Diffusion model weight type: q4_1
0.00.665.348 I load_from_file: VAE weight type: q4_1
0.07.269.982 I operator(): unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file
0.07.446.721 I load_from_file: total params memory size = 16050.10MB (VRAM 16050.10MB, RAM 0.00MB): clip 10648.12MB(VRAM), unet 5241.98MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
0.07.446.725 I load_from_file: loading model from '/data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf' completed, taking 6.45s
0.07.446.726 I load_from_file: running in FLOW mode
0.07.446.924 I srv load_model: sampler: euler, steps: 10, cfg scale: 4.50
0.07.446.927 I srv main: initializing server
0.07.446.928 I srv init: initializing slots, n_slots = 4
0.07.446.929 I slot init: id 0 | task -1 | new slot n_ctx_slot = 0
0.07.446.933 I slot init: id 1 | task -1 | new slot n_ctx_slot = 0
0.07.446.934 I slot init: id 2 | task -1 | new slot n_ctx_slot = 0
0.07.446.935 I slot init: id 3 | task -1 | new slot n_ctx_slot = 0
0.07.446.954 I srv main: starting server
---> unknown tensor :
0.07.269.982 I operator(): unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file
4、Send a GET request to generate an image.
{
"model": "sd3.5-large",
"prompt": "fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details, 4k resolution",
"n": 1,
"response_format": "b64_json",
"size": "512x512"
}
3.14.266.389 I srv oaicompat_images_generati: params: {"model":"sd3.5-large","n":1,"prompt":"...","response_format":"b64_json","size":"512x512"}
3.14.266.653 I slot launch_slot_with_task: id 0 | task 0 | processing task
3.14.795.859 I generate_image: get_learned_condition completed, taking 528 ms
3.14.795.865 I generate_image: sampling using Euler method
3.14.795.866 I generate_image: generating image: 1/1 - seed 4294967295
3.15.425.313 I generate_image: sampling 001/010 - 0.61s/it
3.16.009.286 I generate_image: sampling 002/010 - 0.58s/it
3.16.590.767 I generate_image: sampling 003/010 - 0.58s/it
3.17.171.978 I generate_image: sampling 004/010 - 0.58s/it
3.17.753.149 I generate_image: sampling 005/010 - 0.58s/it
3.18.334.177 I generate_image: sampling 006/010 - 0.58s/it
3.18.915.758 I generate_image: sampling 007/010 - 0.58s/it
3.19.498.351 I generate_image: sampling 008/010 - 0.58s/it
3.20.080.382 I generate_image: sampling 009/010 - 0.58s/it
3.20.663.899 I generate_image: sampling 010/010 - 0.58s/it
3.20.664.512 I generate_image: sampling completed, taking 5.87s
3.20.664.516 I generate_image: generating 1 latent images completed, taking 5.87s
3.20.664.517 I generate_image: decoding 1 latents
3.20.895.215 I generate_image: latent 1 decoded, taking 0.23s
3.20.895.217 I generate_image: decode_first_stage completed, taking 0.23s
3.20.897.226 I txt2img: txt2img completed in 6.63s
3.20.909.999 I slot release: id 0 | task 0 | stop processing: n_past = 0, truncated = 0
3.20.910.166 I srv log_server_request: request 200: POST /v1/images/generations 127.0.0.1:26352
5、Receive the response from the request.
{
"created": 1732000352,
"data": [
{
"b64_json": "iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAIAAAB7GkOtAAAd1ElEQVR4XmMYBaMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhQEIIAAACtAAB03JHdwAAAABJRU5ErkJggA=="
}
],
"model": "sd3.5-large",
"object": "list"
}
6、Test directly using a browser.

7、The browser returns the image content (completely black content).
./sd -m /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf --t5xxl /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors --clip_l /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors --clip_g /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors -p 'a lovely cat holding a sign says "Stable diffusion 3.5 Large"' --cfg-scale 4.5 --sampling-method euler --steps 20
Following your steps, I found something interesting.
Under Apple M1 Max, the result is expected.
The result is still solid black when deploying under NVIDIA 4090 until I offload the text encoders to the CPU (with --image-no-text-encoder-model-offload
).
Not like SD.cpp, llama-box tries to offload all components to the GPU at first. For models with a T5XXL text encoder, you can use --image-no-text-encoder-model-offload
as a workaround to avoid solid-black image generation.
The image generated using llama-box is black, with the following command:
./llama-box --host 127.0.0.1 --port 8080 -m /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf --alias sd3.5-large --images --image-t5xxl-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors --image-sampler euler --image-clip-l-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors --image-clip-g-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors --image-cfg-scale 4.50 --image-sample-steps 20
Request as follows:
{
"model": "sd3.5-large",
"prompt": "a lovely cat holding a sign says \"Stable diffusion 3.5 Large\"",
"n": 1,
"size": "1024x1024"
}
The image content obtained after base64 decoding the result is black. The same decoding method applied to OpenAI's Dall-E 3 works normally and results in a correct image.
The image is as follows:
Using https://github.com/leejet/stable-diffusion.cpp, the same model and parameters generate normal images, with the command:
./sd -m /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf --t5xxl /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors --clip_l /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors --clip_g /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors -p 'a lovely cat holding a sign says \"Stable diffusion 3.5 Large\"' --cfg-scale 4.5 --sampling-method euler --steps 20
The image is as follows:
Ubuntu 24.04 CUDA 12.6