leejet / stable-diffusion.cpp

Stable Diffusion and Flux in pure C/C++
MIT License
3.37k stars 284 forks source link

miniSD/nanoSD (256x256 and 128x128 image generation) results #28

Open walking-octopus opened 1 year ago

walking-octopus commented 1 year ago

I'm sorry for polluting the GitHub issues with non-bugs, but since that precedent was already set by #1 and there's no Discussions enabled, I thought it may appropriate to share it here.

Laptop CPUs are always rather underpowered. As said in #15, even old desktop CPUs perform much better than modern mid-range laptops. Even more so, phones and ARM micro-computers are laughably slow.

The sampling can be much sped up by using a lower resolution, but models expectedly perform very poorly at resolutions lower than trained, resulting in colorful abstract shapes only vaguely resembling the expected objects.

But someone on HuggingFace managed to fine-tuned on 256x256 and 128x128 images to the point of getting coherent outputs!

This is great news for CPU inference, since the sampling time was cut in half! The outputs might have looked slightly less detailed, but were perfectly coherent.

I haven't investigated if there are any differences in outputs between stable-diffusion.cpp and official implementation, or if quantization has greater impact at lower resolution, but it does seem promising for real-life usage of this project.

leejet commented 1 year ago

Discussions is indeed a good place for discussions and sharing. I've set up Discussions for this project. Would you mind moving this issue to Discussions?

leejet commented 1 year ago

By the way, I took a look at these two projects, and they seem to be fine-tuning weights on official models. So, they should be applicable to this project.

walking-octopus commented 1 year ago

Discussions is indeed a good place for discussions and sharing. I've set up Discussions for this project. Would you mind moving this issue to Discussions?

Sure, I don't mind.

By the way, I took a look at these two projects, and they seem to be fine-tuning weights on official models. So, they should be applicable to this project.

Yes, I've even tested one of them and got perfectly coherent outputs. I was just wondering if this implementation matches the official implementation perfectly, and if quantization has any special effects when lowering the resolution.

leejet commented 1 year ago

Quantization doesn't have any special effects when lowering the resolution. Quantization mainly involves sacrificing some computational precision in exchange for lower memory and storage usage.

nviet commented 1 year ago

Even more so, phones and ARM micro-computers are laughably slow.

Yeah no doubt. IMO, it's because right now stable diffusion has not been optimized for such devices yet. My phone, which comes with a chipset released 2 years ago, took 55 mins to generate a 512x512 image using code in this repo (which runs purely on CPU). But in a different approach to stable diffusion in which Vulkan is enabled, it took only 13 mins.

In a paper publish by researchers from Google (link, link), the result is even more impressive when GPU support is there. Sadly there has been no code released yet but there is clearly potential for such devices.

After applying all of these optimizations, we conducted tests of Stable Diffusion 1.5 (image resolution 512x512, 20 iterations) on high-end mobile devices. Running Stable Diffusion with our GPU-accelerated ML inference model uses 2,093MB for the weights and 84MB for the intermediate tensors. With latest high-end smartphones, Stable Diffusion can be run in under 12 seconds.

JohnClaw commented 1 year ago
  • (Not yet tested, need to convert to .ckpt first)

I made ckpt and ggml versions: https://huggingface.co/NikolayKozloff/stable-diffusion-nano-2-1-ckpt/resolve/main/stable-diffusion-nano-2-1.ckpt https://huggingface.co/NikolayKozloff/stable-diffusion-nano-2-1-ggml/resolve/main/stable-diffusion-nano-2-1-ggml-f32.bin

P.s: I tried to run f32 model but got this error: [ERROR] stable-diffusion.cpp:2893 - tensor 'model.diffusion_model.input_blocks.1.1.proj_in.weight' has wrong shape in model file: got [320, 320, 1, 1], expected [1, 1, 320, 320] Can it be fixed somehow?

I used this script to make ckpt: https://raw.githubusercontent.com/huggingface/diffusers/main/scripts/convert_diffusers_to_original_stable_diffusion.py

I ran it this way: python convert_diffusers_to_original_stable_diffusion.py --model_path SD_nano/ --checkpoint_path SD_nano.ckpt"

JohnClaw commented 1 year ago

I also managed to find a ckpt for older version of SD nano: https://huggingface.co/Simona198710/stable-diffusion-nano-ckpt/resolve/main/stable-diffusion-nano.ckpt.ckpt

I converted it to f32 ggml and it worked.

I ran it this way: sd.exe -m stable-diffusion-nano.ckpt-ggml-model-f32.bin -t 8 --steps 40 --height 128 --width 128 --seed -1 -p "a female face"

It take 63 seconds to make the image at Ryzen 7 4700u (turbo-boost was off): image

leejet commented 1 year ago

P.s: I tried to run f32 model but got this error: [ERROR] stable-diffusion.cpp:2893 - tensor 'model.diffusion_model.input_blocks.1.1.proj_in.weight' has wrong shape in model file: got [320, 320, 1, 1], expected [1, 1, 320, 320] Can it be fixed somehow?

https://huggingface.co/NikolayKozloff/stable-diffusion-nano-2-1-ckpt/resolve/main/stable-diffusion-nano-2-1.ckpt is a model from SD2.x, which is currently not supported. I suggest you use models from SD1.x for now (for example, https://huggingface.co/Simona198710/stable-diffusion-nano-ckpt/resolve/main/stable-diffusion-nano.ckpt.ckpt), as I plan to add support for SD2.x models in the near future.

leejet commented 1 year ago

@JohnClaw Support for SD2.x has been added, and you can try it out by pulling the latest code, converting the model file, and rebuilding the executable.

Here is an example https://github.com/leejet/stable-diffusion.cpp/discussions/41

JohnClaw commented 1 year ago

Support for SD2.x has been added, and you can try it out by pulling the latest code, converting the model file, and rebuilding the executable.

I did as you adviced but got this error:

C:\sd>sd.exe -m stable-diffusion-nano-2-1-ggml-model-f32.bin -t 8 --steps 10 --height 128 --width 128 --seed -1 -p "a wolf wearing sun glasses, highly detailed" [INFO] stable-diffusion.cpp:2793 - loading model from 'stable-diffusion-nano-2-1-ggml-model-f32.bin' [INFO] stable-diffusion.cpp:2821 - model type: SD2.x [INFO] stable-diffusion.cpp:2829 - ftype: f32 [WARN] stable-diffusion.cpp:2991 - unknown tensor 'cond_stage_model.model.transformer.text_model.embeddings.position_ids' in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.layer_norm1.bias' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.layer_norm1.weight' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.layer_norm2.bias' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.layer_norm2.weight' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.mlp.fc1.bias' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.mlp.fc1.weight' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.mlp.fc2.bias' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.mlp.fc2.weight' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.self_attn.k_proj.bias' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.self_attn.k_proj.weight' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.self_attn.out_proj.bias' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.self_attn.out_proj.weight' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.self_attn.q_proj.bias' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.self_attn.q_proj.weight' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.self_attn.v_proj.bias' not in model file [ERROR] stable-diffusion.cpp:3033 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.23.self_attn.v_proj.weight' not in model file

leejet commented 1 year ago

I have fixed the problem. You can pull the latest code and rebuild the executable.

./bin/sd -m ../models/stable-diffusion-nano-2-1-ggml-model-f32.bin -t 8 --steps 10 --height 128 --width 128 --seed -1 -p "a wolf wearing sun glasses, highly detailed"

output

JohnClaw commented 1 year ago

I have fixed the problem. You can pull the latest code and rebuild the executable.

It raised a new error:

C:\sd>sd.exe -m stable-diffusion-nano-2-1-ggml-model-f32.bin -t 8 --steps 10 --height 128 --width 128 --seed -1 -p "a wolf wearing sun glasses, highly detailed" [INFO] stable-diffusion.cpp:2793 - loading model from 'stable-diffusion-nano-2-1-ggml-model-f32.bin' [INFO] stable-diffusion.cpp:2821 - model type: SD2.x [INFO] stable-diffusion.cpp:2829 - ftype: f32 [WARN] stable-diffusion.cpp:2991 - unknown tensor 'cond_stage_model.model.transformer.text_model.embeddings.position_ids' in model file [INFO] stable-diffusion.cpp:3057 - total params size = 3621.07MB (clip 1346.65MB, unet 2179.92MB, vae 94.51MB) [INFO] stable-diffusion.cpp:3059 - loading model from 'stable-diffusion-nano-2-1-ggml-model-f32.bin' completed, taking 4.99s [INFO] stable-diffusion.cpp:3188 - check is_using_v_parameterization_for_sd2 completed, taking 0.25s [INFO] stable-diffusion.cpp:3084 - running in eps-prediction mode [INFO] stable-diffusion.cpp:3316 - condition graph use 1351.93MB of memory: params 1346.65MB, runtime 5.29MB (static 1.38MB, dynamic 3.91MB) [INFO] stable-diffusion.cpp:3316 - condition graph use 1351.93MB of memory: params 1346.65MB, runtime 5.29MB (static 1.38MB, dynamic 3.91MB) [INFO] stable-diffusion.cpp:3854 - get_learned_condition completed, taking 1.34s [INFO] stable-diffusion.cpp:3870 - start sampling

Sampling failed. Image wasn't created.

leejet commented 1 year ago

It should be fixed by now. You can pull the latest code and rebuild the executable.

JohnClaw commented 1 year ago

It should be fixed by now. You can pull the latest code and rebuild the executable.

Thank you. It works. An image was generated. However error message still remains in console log:

[WARN] stable-diffusion.cpp:2991 - unknown tensor 'cond_stage_model.model.transformer.text_model.embeddings.position_ids' in model file

leejet commented 1 year ago

[WARN] stable-diffusion.cpp:2991 - unknown tensor 'cond_stage_model.model.transformer.text_model.embeddings.position_ids' in model file

Don't worry, this won't have any effect on image generation, you can just ignore it

paulocoutinhox commented 11 months ago

Hi,

Im have the same error with latest version:

image

This model: https://huggingface.co/NikolayKozloff/stable-diffusion-nano-2-1-ggml/tree/main

Error:

[INFO]  stable-diffusion.cpp:2832 - loading model from '/Users/paulo/Downloads/stable-diffusion-nano-2-1-ggml-f32.bin'
[INFO]  stable-diffusion.cpp:2860 - model type: SD1.x
[INFO]  stable-diffusion.cpp:2868 - ftype: f32
[ERROR] stable-diffusion.cpp:3047 - tensor 'model.diffusion_model.input_blocks.1.1.proj_in.weight' has wrong shape in model file: got [320, 320, 1, 1], expected [1, 1, 320, 320]
[2023-10-27 15:00:07.824] [error] [MappingStableDiffusion :: callbackGenerate] Error while load model: /Users/paulo/Downloads/stable-diffusion-nano-2-1-ggml-f32.bin

Code:

const std::string &model = modelOpt.value();
const std::string &prompt = promptOpt.value();

bool vaeDecodeOnly = true;

const char *rngTypeToStr[] = {
    "std_default",
};

Schedule schedule = DEFAULT;
std::string negative_prompt;
float cfg_scale = 7.0f;
int w = 512;
int h = 512;
SampleMethod sample_method = EULER_A;
int sample_steps = 20;
float strength = 0.75f;
RNGType rng_type = CUDA_RNG;
int64_t seed = 42;

StableDiffusion sd(4, vaeDecodeOnly, true);

if (!sd.load_from_file(model, schedule))
{
    spdlog::error("[MappingStableDiffusion :: callbackGenerate] Error while load model: {}", model);
    r(Image{"ERROR-LOAD-MODEL"});
    return;
}

std::vector<uint8_t> img = sd.txt2img(
    prompt,
    negative_prompt,
    cfg_scale,
    w,
    h,
    sample_method,
    sample_steps,
    seed);

Model list:

What models i can use with this code? There is a list of it? 128, 256 or 512x512 models?

Thanks.

leejet commented 11 months ago

This is because the model was converted using the older models/convert.py, which did not support SD2.x at the time. You can use the latest models/convert.py for the conversion. https://huggingface.co/NikolayKozloff/stable-diffusion-nano-2-1-ckpt/resolve/main/stable-diffusion-nano-2-1.ckpt

paulocoutinhox commented 11 months ago

Im using the current master script:

paulo ~/Developer/workspaces/python/stable-diffusion.cpp/models [master] $ python3 convert.py                                                                                                                            
usage: convert.py [-h] [--out_type {f32,f16,q4_0,q4_1,q5_0,q5_1,q8_0}] [--out_file OUT_FILE] model_path
leejet commented 11 months ago

What was the original model you used before the conversion? I used the latest conversion script to convert this model (https://huggingface.co/NikolayKozloff/stable-diffusion-nano-2-1-ckpt/resolve/main/stable-diffusion-nano-2-1.ckpt), and I didn't encounter this issue.

By the way, please don't download the ggml model (https://huggingface.co/NikolayKozloff/stable-diffusion-nano-2-1-ggml/tree/main) directly. You need to download the ckpt(https://huggingface.co/NikolayKozloff/stable-diffusion-nano-2-1-ckpt/resolve/main/stable-diffusion-nano-2-1.ckpt) file and then convert it.

paulocoutinhox commented 10 months ago

Hi,

Now i can convert and it generate the model.

But when i try on my Mac M1 it took around 87 secs for each sample:

2023-11-21 21:26:42.053] [debug] [MappingStableDiffusion :: callbackGenerate] Generating subtitles...
[INFO]  stable-diffusion.cpp:2897 - loading model from '/Users/paulo/Downloads/stable-diffusion-cpp/models/stable-diffusion-nano-2-1.bin'
[INFO]  stable-diffusion.cpp:2925 - model type: SD2.x
[INFO]  stable-diffusion.cpp:2933 - ftype: f32
[WARN]  stable-diffusion.cpp:3093 - unknown tensor 'cond_stage_model.model.transformer.text_model.embeddings.position_ids' in model file
[INFO]  stable-diffusion.cpp:3159 - total params size = 3686.36MB (clip 1346.66MB, unet 2179.94MB, vae 159.77MB)
[INFO]  stable-diffusion.cpp:3161 - loading model from '/Users/paulo/Downloads/stable-diffusion-cpp/models/stable-diffusion-nano-2-1.bin' completed, taking 1.85s
[INFO]  stable-diffusion.cpp:3311 - check is_using_v_parameterization_for_sd2 completed, taking 1.24s
[INFO]  stable-diffusion.cpp:3186 - running in eps-prediction mode
[INFO]  stable-diffusion.cpp:3196 - running with Karras schedule
[INFO]  stable-diffusion.cpp:4653 - apply_loras completed, taking 0.00s
[INFO]  stable-diffusion.cpp:3740 - condition graph use 1421.76MB of memory: params 1346.66MB, runtime 75.11MB (static 11.04MB, dynamic 64.07MB)
[INFO]  stable-diffusion.cpp:3740 - condition graph use 1421.76MB of memory: params 1346.66MB, runtime 75.11MB (static 11.04MB, dynamic 64.07MB)
[INFO]  stable-diffusion.cpp:4662 - get_learned_condition completed, taking 0.79s
[INFO]  stable-diffusion.cpp:4678 - start sampling
[INFO]  stable-diffusion.cpp:3936 - sampling using Euler A method
[INFO]  stable-diffusion.cpp:3924 - step 1 sampling completed, taking 87.21s
[INFO]  stable-diffusion.cpp:3924 - step 2 sampling completed, taking 87.21s

Im doing something wrong?