gpustack / llama-box

LLM inference server implementation based on llama.cpp.
MIT License
34 stars 5 forks source link

The image generation failed. #6

Open shibingli opened 1 week ago

shibingli commented 1 week ago

The image generated using llama-box is black, with the following command:

./llama-box --host 127.0.0.1 --port 8080 -m /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf --alias sd3.5-large --images --image-t5xxl-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors --image-sampler euler --image-clip-l-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors --image-clip-g-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors --image-cfg-scale 4.50 --image-sample-steps 20
Request as follows:

{
"model": "sd3.5-large",
"prompt": "a lovely cat holding a sign says \"Stable diffusion 3.5 Large\"",
"n": 1,
"size": "1024x1024"
}
The image content obtained after base64 decoding the result is black. The same decoding method applied to OpenAI's Dall-E 3 works normally and results in a correct image.

The image is as follows: b408d8c5-aa2b-461b-8721-2b7720115a5d

Using https://github.com/leejet/stable-diffusion.cpp, the same model and parameters generate normal images, with the command:

./sd -m /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf --t5xxl /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors --clip_l /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors --clip_g /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors -p 'a lovely cat holding a sign says \"Stable diffusion 3.5 Large\"' --cfg-scale 4.5 --sampling-method euler --steps 20
The image is as follows: output


Ubuntu 24.04 CUDA 12.6

thxCode commented 1 week ago

The solid black result is often related to the VAE and size, box is built on top of stable-diffusion.cpp, so I will try your case later.

BTW, could you try the upload artifact of this action result: https://github.com/gpustack/llama-box/actions/runs/11832443367/job/32969277953?

We also provide an all-in-one GGUF for SD 3.5 Large https://huggingface.co/gpustack/stable-diffusion-v3-5-large-GGUF.

Here is the result of our testing env.

image image image image

FYI, the log is as follows.

0.00.072.965 I ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
0.00.072.967 I ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
0.00.072.967 I ggml_cuda_init: found 1 CUDA devices:
0.00.074.707 I   Device 0: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes
0.00.077.139 I 
0.00.077.142 I version: main (a29fe58)
0.00.077.142 I compiler: cc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
0.00.077.142 I target: x86_64-linux-gnu
0.00.077.142 I vendor: 
0.00.077.143 I - llama.cpp 66798e42 (376)
0.00.077.143 I - stable-diffusion.cpp b5b57e9 (173)
0.00.078.333 I system_info: n_threads = 6 (n_threads_batch = 6) / 20 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 
0.00.078.335 I 
0.00.078.432 I srv                       main: listening, hostname = 127.0.0.1, port = 8080, n_threads = 3 + 2
0.00.079.564 I srv                       main: loading model
0.00.079.598 I load_from_file: loading model from '/home/frank/gpustack/stable-diffusion-v3-5-large-GGUF/stable-diffusion-v3-5-large-Q8_0.gguf'
0.00.079.613 I init_from_file: load /home/frank/gpustack/stable-diffusion-v3-5-large-GGUF/stable-diffusion-v3-5-large-Q8_0.gguf using gguf format
0.00.094.295 I load_from_file: Version: SD3.5 Large 
0.00.097.853 I load_from_file: Weight type:                 q8_0
0.00.097.854 I load_from_file: Conditioner weight type:     q8_0
0.00.097.854 I load_from_file: Diffusion model weight type: q8_0
0.00.097.854 I load_from_file: VAE weight type:             f16
0.03.696.669 I load_from_file: total params memory size = 14412.42MB (VRAM 14412.42MB, RAM 0.00MB): clip 5661.64MB(VRAM), unet 8590.78MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
0.03.696.673 I load_from_file: loading model from '/home/frank/gpustack/stable-diffusion-v3-5-large-GGUF/stable-diffusion-v3-5-large-Q8_0.gguf' completed, taking 2.97s
0.03.696.674 I load_from_file: running in FLOW mode
0.03.696.781 I srv                 load_model: sampler: euler, steps: 20, cfg scale: 4.50
0.03.696.782 I srv                       main: initializing server
0.03.696.783 I srv                       init: initializing slots, n_slots = 1
0.03.696.789 I slot                      init: id  0 | task -1 | new slot n_ctx_slot = 1112406280
0.03.696.875 I srv                       main: starting server
0.08.235.255 I srv  oaicompat_images_generati: params: {"n":1,"prompt":"...","quality":"standard","response_format":"b64_json","size":"512x512"}
0.08.235.438 I slot     launch_slot_with_task: id  0 | task 0 | processing task
0.08.235.853 I apply_loras: Attempting to apply 0 LoRAs
0.08.235.856 I generate_image: apply_loras completed, taking 0.00s
0.08.385.031 I generate_image: get_learned_condition completed, taking 149 ms
0.08.385.032 I generate_image: sampling using Euler method
0.08.385.033 I generate_image: generating image: 1/1 - seed 4294967295
0.08.832.152 I generate_image: sampling 001/020 - 0.34s/it
0.09.132.441 I generate_image: sampling 002/020 - 0.30s/it
0.09.432.522 I generate_image: sampling 003/020 - 0.30s/it
0.09.732.681 I generate_image: sampling 004/020 - 0.30s/it
0.10.032.784 I generate_image: sampling 005/020 - 0.30s/it
0.10.332.987 I generate_image: sampling 006/020 - 0.30s/it
0.10.633.147 I generate_image: sampling 007/020 - 0.30s/it
0.10.933.553 I generate_image: sampling 008/020 - 0.30s/it
0.11.233.649 I generate_image: sampling 009/020 - 0.30s/it
0.11.533.804 I generate_image: sampling 010/020 - 0.30s/it
0.11.834.027 I generate_image: sampling 011/020 - 0.30s/it
0.12.134.123 I generate_image: sampling 012/020 - 0.30s/it
0.12.434.221 I generate_image: sampling 013/020 - 0.30s/it
0.12.734.299 I generate_image: sampling 014/020 - 0.30s/it
0.13.034.417 I generate_image: sampling 015/020 - 0.30s/it
0.13.334.533 I generate_image: sampling 016/020 - 0.30s/it
0.13.634.596 I generate_image: sampling 017/020 - 0.30s/it
0.13.934.691 I generate_image: sampling 018/020 - 0.30s/it
0.14.234.861 I generate_image: sampling 019/020 - 0.30s/it
0.14.534.980 I generate_image: sampling 020/020 - 0.30s/it
0.14.535.366 I generate_image: sampling completed, taking 6.15s
0.14.535.368 I generate_image: generating 1 latent images completed, taking 6.15s
0.14.535.369 I generate_image: decoding 1 latents
0.14.727.549 I generate_image: latent 1 decoded, taking 0.19s
0.14.727.550 I generate_image: decode_first_stage completed, taking 0.19s
0.14.731.077 I txt2img: txt2img completed in 6.49s
0.14.830.609 I slot                   release: id  0 | task 0 | stop processing: n_past = 0, truncated = 0
0.14.830.720 I srv         log_server_request: request 200: POST /v1/images/generations 127.0.0.1:36270
0.52.385.462 I srv  oaicompat_images_generati: params: {"n":1,"prompt":"...","quality":"standard","response_format":"b64_json","size":"1024x1024"}
0.52.385.573 I slot     launch_slot_with_task: id  0 | task 33561 | processing task
0.52.386.502 I apply_loras: Attempting to apply 0 LoRAs
0.52.386.504 I generate_image: apply_loras completed, taking 0.00s
0.52.508.524 I generate_image: get_learned_condition completed, taking 122 ms
0.52.508.525 I generate_image: sampling using Euler method
0.52.508.526 I generate_image: generating image: 1/1 - seed 4294967295
0.54.899.931 I generate_image: sampling 001/020 - 1.94s/it
0.56.839.092 I generate_image: sampling 002/020 - 1.94s/it
0.58.776.527 I generate_image: sampling 003/020 - 1.94s/it
1.00.714.234 I generate_image: sampling 004/020 - 1.94s/it
1.02.651.816 I generate_image: sampling 005/020 - 1.94s/it
1.04.589.630 I generate_image: sampling 006/020 - 1.94s/it
1.06.529.040 I generate_image: sampling 007/020 - 1.94s/it
1.08.468.232 I generate_image: sampling 008/020 - 1.94s/it
1.10.407.831 I generate_image: sampling 009/020 - 1.94s/it
1.12.347.398 I generate_image: sampling 010/020 - 1.94s/it
1.14.287.146 I generate_image: sampling 011/020 - 1.94s/it
1.16.226.729 I generate_image: sampling 012/020 - 1.94s/it
1.18.166.256 I generate_image: sampling 013/020 - 1.94s/it
1.20.106.069 I generate_image: sampling 014/020 - 1.94s/it
1.22.046.248 I generate_image: sampling 015/020 - 1.94s/it
1.23.986.123 I generate_image: sampling 016/020 - 1.94s/it
1.25.925.880 I generate_image: sampling 017/020 - 1.94s/it
1.27.865.817 I generate_image: sampling 018/020 - 1.94s/it
1.29.806.029 I generate_image: sampling 019/020 - 1.94s/it
1.31.745.792 I generate_image: sampling 020/020 - 1.94s/it
1.31.747.293 I generate_image: sampling completed, taking 39.24s
1.31.747.296 I generate_image: generating 1 latent images completed, taking 39.24s
1.31.747.296 I generate_image: decoding 1 latents
1.32.624.759 I generate_image: latent 1 decoded, taking 0.88s
1.32.624.760 I generate_image: decode_first_stage completed, taking 0.88s
1.32.638.224 I txt2img: txt2img completed in 40.25s
1.33.037.808 I slot                   release: id  0 | task 33561 | stop processing: n_past = 0, truncated = 0
1.33.038.267 I srv         log_server_request: request 200: POST /v1/images/generations 127.0.0.1:45150
shibingli commented 1 week ago

@thxCode Thank you very much. I am attempting to update the program and then download the integrated SD model you provided. I will give feedback on the results once it's completed.

thxCode commented 1 week ago

@shibingli , I found that our current releases haven't use CUDA correctly yet, I will release a new version to support this. I will ping you later. thanks.

thxCode commented 6 days ago

@shibingli please give a shot with v0.0.78.

shibingli commented 4 days ago

@thxCode Great! Using the specified latest version of the program and the integrated model you provided has successfully generated the image!

./llama-box --host 127.0.0.1 --port 29353 --threads 62 --parallel 4 -m /data/llm_models/gpustack/stable-diffusion-v3-5-large-GGUF/stable-diffusion-v3-5-large-Q4_0.gguf --alias sd3.5-large --images --image-no-text-encoder-model-offload --image-no-vae-model-offload --image-max-height 1792 --image-max-width 1792 --image-cfg-scale 4.50 --image-max-batch 4 --image-sampler euler
{
    "model": "sd3.5-large",
    "prompt": "fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details, 4k resolution",
    "n": 1,
    "response_format": "b64_json",
    "size": "512x512"
 }

4b885731-7e42-4394-aa7d-1ed2bdaf6cbb

BTW: However, using the official SD3.5 model to generate GGUF still fails to produce an image, resulting in a black screen.

thxCode commented 4 days ago

@shibingli can you detail the process of using the official SD3.5 model to generate GGUF still fails to produce an image, resulting in a black screen? I am curious about the approach you tried. box is based on stable-diffusion.cpp, I hope we can keep a similar experience to it.

shibingli commented 3 days ago

@shibingli can you detail the process of using the official SD3.5 model to generate GGUF still fails to produce an image, resulting in a black screen? I am curious about the approach you tried. box is based on stable-diffusion.cpp, I hope we can keep a similar experience to it.

1、Clone stable-diffusion.cpp and compile it.

$ git clone --recurse-submodules https://github.com/leejet/stable-diffusion.cpp
正克隆到 'stable-diffusion.cpp'...
remote: Enumerating objects: 1153, done.
remote: Counting objects: 100% (301/301), done.
remote: Compressing objects: 100% (96/96), done.
remote: Total 1153 (delta 251), reused 212 (delta 205), pack-reused 852 (from 1)
接收对象中: 100% (1153/1153), 21.79 MiB | 2.06 MiB/s, 完成.
处理 delta 中: 100% (699/699), 完成.
子模组 'ggml'(https://github.com/ggerganov/ggml.git)已对路径 'ggml' 注册
正克隆到 '/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp/ggml'...
remote: Enumerating objects: 11833, done.        
remote: Counting objects: 100% (5108/5108), done.        
remote: Compressing objects: 100% (1121/1121), done.        
remote: Total 11833 (delta 4293), reused 4254 (delta 3961), pack-reused 6725 (from 1)        
接收对象中: 100% (11833/11833), 11.30 MiB | 13.98 MiB/s, 完成.
处理 delta 中: 100% (8087/8087), 完成.
子模组路径 'ggml':检出 '21d3a308fcb7f31cb9beceaeebad4fb622f3c337'

$ cd stable-diffusion.cpp
$ cmake -B build -DSD_CUBLAS=ON
-- The C compiler identification is GNU 13.2.0
-- The CXX compiler identification is GNU 13.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Use CUBLAS as backend stable-diffusion
-- Build static library
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- OpenMP found
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.6.77") 
-- CUDA found
-- Using CUDA architectures: 52;61;70;75
-- The CUDA compiler identification is NVIDIA 12.6.77
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- CUDA host compiler is GNU 13.2.0

-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (5.4s)
-- Generating done (0.0s)
-- Build files have been written to: /opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/build

$ cmake --build build --config Release -j 128
[  1%] Building C object thirdparty/CMakeFiles/zip.dir/zip.c.o
[  2%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[  5%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-backend.c.o
[  5%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o
[  7%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-quants.c.o
[  7%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/acc.cu.o
[  8%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/binbcast.cu.o
[ 11%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/argsort.cu.o
[ 11%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/arange.cu.o
[ 12%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/clamp.cu.o
[ 14%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/concat.cu.o
[ 15%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/conv-transpose-1d.cu.o
[ 16%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/convert.cu.o
[ 19%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/cross-entropy-loss.cu.o
[ 19%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/cpy.cu.o
[ 20%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/diagmask.cu.o
[ 21%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/dmmv.cu.o
[ 24%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/fattn-tile-f32.cu.o
[ 24%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/fattn.cu.o
[ 26%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/fattn-tile-f16.cu.o
[ 26%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/getrows.cu.o
In file included from /opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/thirdparty/zip.c:40:
/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/thirdparty/miniz.h:4988:9: note: ‘#pragma message: Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.’
 4988 | #pragma message(                                                               \
      |         ^~~~~~~
[ 28%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/im2col.cu.o
[ 29%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/mmq.cu.o
[ 30%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/mmvq.cu.o
[ 32%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/norm.cu.o
[ 33%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/pad.cu.o
[ 33%] Built target zip
[ 34%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/pool2d.cu.o
[ 35%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/quantize.cu.o
[ 37%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/rope.cu.o
[ 38%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/scale.cu.o
[ 39%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/softmax.cu.o
[ 41%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/sumrows.cu.o
[ 42%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/tsembd.cu.o
[ 43%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/unary.cu.o
[ 44%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/upscale.cu.o
[ 47%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb16.cu.o
[ 47%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda.cu.o
[ 48%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb32.cu.o
[ 50%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb16.cu.o
[ 51%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb32.cu.o
[ 52%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb8.cu.o
[ 55%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq1_s.cu.o
[ 55%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq2_xs.cu.o
[ 56%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq2_s.cu.o
[ 57%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq2_xxs.cu.o
[ 58%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq3_s.cu.o
[ 60%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq3_xxs.cu.o
[ 61%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq4_nl.cu.o
[ 62%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq4_xs.cu.o
[ 64%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q2_k.cu.o
[ 65%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q3_k.cu.o
/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2436): warning #177-D: function "set_ggml_graph_node_properties" was declared but never referenced
  static void set_ggml_graph_node_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2448): warning #177-D: function "ggml_graph_node_has_matching_properties" was declared but never referenced
  static bool ggml_graph_node_has_matching_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
              ^

/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2436): warning #177-D: function "set_ggml_graph_node_properties" was declared but never referenced
  static void set_ggml_graph_node_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2448): warning #177-D: function "ggml_graph_node_has_matching_properties" was declared but never referenced
  static bool ggml_graph_node_has_matching_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
              ^

/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2436): warning #177-D: function "set_ggml_graph_node_properties" was declared but never referenced
  static void set_ggml_graph_node_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2448): warning #177-D: function "ggml_graph_node_has_matching_properties" was declared but never referenced
  static bool ggml_graph_node_has_matching_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
              ^

/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2436): warning #177-D: function "set_ggml_graph_node_properties" was declared but never referenced
  static void set_ggml_graph_node_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

/opt/docker_builds/stable-diffusion_cpp/stable-diffusion.cpp.cuda/ggml/src/ggml-cuda.cu(2448): warning #177-D: function "ggml_graph_node_has_matching_properties" was declared but never referenced
  static bool ggml_graph_node_has_matching_properties(ggml_tensor * node, ggml_graph_node_properties * graph_node_properties) {
              ^

[ 66%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q4_0.cu.o
[ 67%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q4_1.cu.o
[ 69%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q4_k.cu.o
[ 70%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q5_0.cu.o
[ 71%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q5_1.cu.o
[ 73%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q5_k.cu.o
[ 74%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q6_k.cu.o
[ 75%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q8_0.cu.o
[ 76%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q4_0-q4_0.cu.o
[ 79%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q8_0-q8_0.cu.o
[ 79%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q8_0-q8_0.cu.o
[ 80%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q4_0-q4_0.cu.o
[ 83%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-f16-f16.cu.o
[ 84%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs64-f16-f16.cu.o
[ 84%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs256-f16-f16.cu.o
[ 85%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-f16-f16.cu.o
[ 87%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs64-f16-f16.cu.o
[ 88%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs256-f16-f16.cu.o
[ 89%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-aarch64.c.o
[ 91%] Linking CUDA static library libggml.a
[ 91%] Built target ggml
[ 92%] Building CXX object CMakeFiles/stable-diffusion.dir/model.cpp.o
[ 93%] Building CXX object CMakeFiles/stable-diffusion.dir/stable-diffusion.cpp.o
[ 94%] Building CXX object CMakeFiles/stable-diffusion.dir/util.cpp.o
[ 96%] Building CXX object CMakeFiles/stable-diffusion.dir/upscaler.cpp.o
[ 97%] Linking CXX static library libstable-diffusion.a
[ 97%] Built target stable-diffusion
[ 98%] Building CXX object examples/cli/CMakeFiles/sd.dir/main.cpp.o
[100%] Linking CXX executable ../../bin/sd
[100%] Built target sd

$ cd build/bin
$ ls
sd

2、Generate a gguf model.

$ ./sd -M convert -m /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors -o  /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf -v --type q4_1 && ./sd -M convert -m /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors -o  /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q8_0.gguf -v --type q8_0
Option: 
    n_threads:         64
    mode:              convert
    model_path:        /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors
    wtype:             q4_1
    clip_l_path:       
    clip_g_path:       
    t5xxl_path:        
    diffusion_model_path:   
    vae_path:          
    taesd_path:        
    esrgan_path:       
    controlnet_path:   
    embeddings_path:   
    stacked_id_embeddings_path:   
    input_id_images_path:   
    style ratio:       20.00
    normalize input image :  false
    output_path:       /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf
    init_img:          
    control_image:     
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    strength(control): 0.90
    prompt:            
    negative_prompt:   
    min_cfg:           1.00
    cfg_scale:         7.00
    guidance:          3.50
    clip_skip:         -1
    width:             512
    height:            512
    sample_method:     euler_a
    schedule:          default
    sample_steps:      20
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
System Info: 
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 1
    AVX512_VBMI = 1
    AVX512_VNNI = 1
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[INFO ] model.cpp:804  - load /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors using safetensors format
[DEBUG] model.cpp:872  - init from '/data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors'
[INFO ] model.cpp:1794 - model tensors mem size: 5349.79MB
[DEBUG] model.cpp:1548 - loading tensors from /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors
[INFO ] model.cpp:1829 - load tensors done
[INFO ] model.cpp:1830 - trying to save tensors to /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf
convert '/data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors'/'' to '/data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf' success
Option: 
    n_threads:         64
    mode:              convert
    model_path:        /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors
    wtype:             q8_0
    clip_l_path:       
    clip_g_path:       
    t5xxl_path:        
    diffusion_model_path:   
    vae_path:          
    taesd_path:        
    esrgan_path:       
    controlnet_path:   
    embeddings_path:   
    stacked_id_embeddings_path:   
    input_id_images_path:   
    style ratio:       20.00
    normalize input image :  false
    output_path:       /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q8_0.gguf
    init_img:          
    control_image:     
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    strength(control): 0.90
    prompt:            
    negative_prompt:   
    min_cfg:           1.00
    cfg_scale:         7.00
    guidance:          3.50
    clip_skip:         -1
    width:             512
    height:            512
    sample_method:     euler_a
    schedule:          default
    sample_steps:      20
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
System Info: 
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 1
    AVX512_VBMI = 1
    AVX512_VNNI = 1
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[INFO ] model.cpp:804  - load /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors using safetensors format
[DEBUG] model.cpp:872  - init from '/data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors'
[INFO ] model.cpp:1794 - model tensors mem size: 8693.64MB
[DEBUG] model.cpp:1548 - loading tensors from /data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors
[INFO ] model.cpp:1829 - load tensors done
[INFO ] model.cpp:1830 - trying to save tensors to /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q8_0.gguf
convert '/data/llm_models/stabilityai/stable-diffusion-3.5-large/sd3.5_large.safetensors'/'' to '/data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q8_0.gguf' success

$ cd /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF
$ ls
sd3.5_large-q4_1.gguf  sd3.5_large-q8_0.gguf

3、Run the model using llama-box.

$ ./llama-box --host 127.0.0.1 --port 7219 --threads 62 --parallel 4 -m /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf --alias sd3.5-large --images --image-max-batch 4 --image-t5xxl-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors --image-clip-l-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors --image-clip-g-model /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors
0.00.137.105 I ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
0.00.137.109 I ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
0.00.137.110 I ggml_cuda_init: found 4 CUDA devices:
0.00.139.807 I   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
0.00.143.877 I   Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
0.00.165.996 I   Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
0.00.168.633 I   Device 3: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
0.00.649.635 I 
0.00.649.640 I version: dev (457528f)
0.00.649.641 I compiler: cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0
0.00.649.641 I target: x86_64-linux-gnu
0.00.649.641 I vendor: 
0.00.649.642 I - llama.cpp 4a8ccb37 (392)
0.00.649.642 I - stable-diffusion.cpp 4e75394 (181)
0.00.650.870 I system_info: n_threads = 62 (n_threads_batch = 62) / 128 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 
0.00.650.872 I 
0.00.650.921 I srv                       main: listening, hostname = 127.0.0.1, port = 7219, n_threads = 6 + 2
0.00.652.194 I srv                       main: loading model
0.00.652.229 I load_from_file: loading model from '/data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf'
0.00.652.263 I init_from_file: load /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf using gguf format
0.00.659.035 I load_from_file: loading clip_l from '/data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors'
0.00.659.435 I init_from_file: load /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors using safetensors format
0.00.660.086 I load_from_file: loading clip_g from '/data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors'
0.00.661.027 I init_from_file: load /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors using safetensors format
0.00.662.695 I load_from_file: loading t5xxl from '/data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors'
0.00.663.057 I init_from_file: load /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors using safetensors format
0.00.663.979 I load_from_file: Version: SD3.5 Large 
0.00.665.345 I load_from_file: Weight type:                 q4_1
0.00.665.348 I load_from_file: Conditioner weight type:     f16
0.00.665.348 I load_from_file: Diffusion model weight type: q4_1
0.00.665.348 I load_from_file: VAE weight type:             q4_1
0.07.269.982 I operator(): unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file
0.07.446.721 I load_from_file: total params memory size = 16050.10MB (VRAM 16050.10MB, RAM 0.00MB): clip 10648.12MB(VRAM), unet 5241.98MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
0.07.446.725 I load_from_file: loading model from '/data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf' completed, taking 6.45s
0.07.446.726 I load_from_file: running in FLOW mode
0.07.446.924 I srv                 load_model: sampler: euler, steps: 10, cfg scale: 4.50
0.07.446.927 I srv                       main: initializing server
0.07.446.928 I srv                       init: initializing slots, n_slots = 4
0.07.446.929 I slot                      init: id  0 | task -1 | new slot n_ctx_slot = 0
0.07.446.933 I slot                      init: id  1 | task -1 | new slot n_ctx_slot = 0
0.07.446.934 I slot                      init: id  2 | task -1 | new slot n_ctx_slot = 0
0.07.446.935 I slot                      init: id  3 | task -1 | new slot n_ctx_slot = 0
0.07.446.954 I srv                       main: starting server

---> unknown tensor :

0.07.269.982 I operator(): unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file

4、Send a GET request to generate an image.

{
    "model": "sd3.5-large",
    "prompt": "fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details, 4k resolution",
    "n": 1,
    "response_format": "b64_json",
    "size": "512x512"
 }
3.14.266.389 I srv  oaicompat_images_generati: params: {"model":"sd3.5-large","n":1,"prompt":"...","response_format":"b64_json","size":"512x512"}
3.14.266.653 I slot     launch_slot_with_task: id  0 | task 0 | processing task
3.14.795.859 I generate_image: get_learned_condition completed, taking 528 ms
3.14.795.865 I generate_image: sampling using Euler method
3.14.795.866 I generate_image: generating image: 1/1 - seed 4294967295
3.15.425.313 I generate_image: sampling 001/010 - 0.61s/it
3.16.009.286 I generate_image: sampling 002/010 - 0.58s/it
3.16.590.767 I generate_image: sampling 003/010 - 0.58s/it
3.17.171.978 I generate_image: sampling 004/010 - 0.58s/it
3.17.753.149 I generate_image: sampling 005/010 - 0.58s/it
3.18.334.177 I generate_image: sampling 006/010 - 0.58s/it
3.18.915.758 I generate_image: sampling 007/010 - 0.58s/it
3.19.498.351 I generate_image: sampling 008/010 - 0.58s/it
3.20.080.382 I generate_image: sampling 009/010 - 0.58s/it
3.20.663.899 I generate_image: sampling 010/010 - 0.58s/it
3.20.664.512 I generate_image: sampling completed, taking 5.87s
3.20.664.516 I generate_image: generating 1 latent images completed, taking 5.87s
3.20.664.517 I generate_image: decoding 1 latents
3.20.895.215 I generate_image: latent 1 decoded, taking 0.23s
3.20.895.217 I generate_image: decode_first_stage completed, taking 0.23s
3.20.897.226 I txt2img: txt2img completed in 6.63s
3.20.909.999 I slot                   release: id  0 | task 0 | stop processing: n_past = 0, truncated = 0
3.20.910.166 I srv         log_server_request: request 200: POST /v1/images/generations 127.0.0.1:26352

5、Receive the response from the request.

{
    "created": 1732000352,
    "data": [
        {
            "b64_json": "iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAIAAAB7GkOtAAAd1ElEQVR4XmMYBaMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhMBoCoyEwGgKjITAaAqMhQEIIAAACtAAB03JHdwAAAABJRU5ErkJggA=="
        }
    ],
    "model": "sd3.5-large",
    "object": "list"
}

6、Test directly using a browser.



7、The browser returns the image content (completely black content). image

shibingli commented 3 days ago

./sd -m /data/llm_models/stabilityai/stable-diffusion-3.5-large-GGUF/sd3.5_large-q4_1.gguf --t5xxl /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/t5xxl_fp16.safetensors --clip_l /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_l.safetensors --clip_g /data/llm_models/Comfy-Org/stable-diffusion-3.5-fp8/text_encoders/clip_g.safetensors -p 'a lovely cat holding a sign says "Stable diffusion 3.5 Large"' --cfg-scale 4.5 --sampling-method euler --steps 20

output

thxCode commented 1 day ago

Following your steps, I found something interesting.

Under Apple M1 Max, the result is expected.

image

The result is still solid black when deploying under NVIDIA 4090 until I offload the text encoders to the CPU (with --image-no-text-encoder-model-offload).

image
thxCode commented 1 day ago

Not like SD.cpp, llama-box tries to offload all components to the GPU at first. For models with a T5XXL text encoder, you can use --image-no-text-encoder-model-offload as a workaround to avoid solid-black image generation.