doesn't work properly on Xiaomi 14

zhouwg commented 3 months ago

Hi,

Thanks for your amazing stable-diffusion.cpp.

I tried to integrate it to personal study project but it doesn't work properly as expected on Xiaomi 14.

https://github.com/zhouwg/kantv/blob/master/core/ggml/jni/ggml-jni-impl.cpp#L905

https://github.com/zhouwg/kantv/blob/master/core/ggml/jni/ggml-jni-impl-external.cpp#L1866

failed on this function:

https://github.com/zhouwg/kantv/blob/master/core/ggml/stablediffusioncpp/model.cpp#L742

btw, the latest upstream GGML source code was used during the integration process. the following two lines should be modified accordingly:

https://github.com/zhouwg/kantv/blob/master/core/ggml/stablediffusioncpp/model.cpp#L568

https://github.com/zhouwg/kantv/blob/master/core/ggml/stablediffusioncpp/model.cpp#L601

the model is come from:

curl -L -O https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-nonema-pruned.safetensors
/bin/sd -M convert -m v2-1_768-nonema-pruned.safetensors -o  v2-1_768-nonema-pruned.q8_0.gguf -v --type q8_0

it works well on Ubuntu20.04.

zhouwg:$ ./build/bin/sd  -m v2-1_768-nonema-pruned.q8_0.gguf  -p "a luxury car in front of a house" 

[INFO ] stable-diffusion.cpp:194  - Stable Diffusion 2.x 
[INFO ] stable-diffusion.cpp:200  - Stable Diffusion weight type: q8_0
[INFO ] stable-diffusion.cpp:406  - total params memory size = 1858.54MB (VRAM 0.00MB, RAM 1858.54MB): clip 358.79MB(RAM), unet 1405.28MB(RAM), vae 94.47MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:425  - loading model from 'v2-1_768-nonema-pruned.q8_0.gguf' completed, taking 0.55s
[INFO ] stable-diffusion.cpp:440  - running in v-prediction mode
[INFO ] stable-diffusion.cpp:553  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1608 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1719 - get_learned_condition completed, taking 347 ms
[INFO ] stable-diffusion.cpp:1735 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1739 - generating image: 1/1 - seed 42
  |==================================================| 20/20 - 10.97s/it
[INFO ] stable-diffusion.cpp:1776 - sampling completed, taking 219.13s
[INFO ] stable-diffusion.cpp:1784 - generating 1 latent images completed, taking 219.18s
[INFO ] stable-diffusion.cpp:1786 - decoding 1 latents
[INFO ] stable-diffusion.cpp:1796 - latent 1 decoded, taking 17.18s
[INFO ] stable-diffusion.cpp:1800 - decode_first_stage completed, taking 17.18s
[INFO ] stable-diffusion.cpp:1817 - txt2img completed in 236.71s
save result image to 'output.png'

output

FSSRepo commented 3 months ago

Hello, if you provide me with more information, I could help you. If you could provide a screenshot of the error, it would be even better, especially if it's a copy of the CLI output to identify the type of error. If it generates an image but it's not the one you expected, please attach both the result and the expected image.

zhouwg commented 3 months ago

Hello, if you provide me with more information, I could help you. If you could provide a screenshot of the error, it would be even better, especially if it's a copy of the CLI output to identify the type of error. If it generates an image but it's not the one you expected, please attach both the result and the expected image.

thanks for you quickly and warmly comment.

the issue is SD process crashes on Xiaomi 14:https://github.com/zhouwg/kantv/blob/master/core/ggml/jni/ggml-jni-impl.cpp#L905

this personal AI study project is an Android turn-key project, you can reproduce this issue very easily on Xiaomi 14 or any other mainstream Android phone(modify this line accordingly:https://github.com/zhouwg/kantv/blob/master/core/ggml/CMakeLists.txt#L16).

FSSRepo commented 3 months ago

Reviewing your comment again, I suggest you do not use the latest version of ggml (Anyway, there is no significant improvement in any aspect) as it introduces many changes that may end up breaking some of the code in sd.cpp. Use the one that is default in the master branch instead.

FSSRepo commented 3 months ago

this personal AI study project is an Android turn-key project, you can reproduce this issue very easily on Xiaomi 14 or any other mainstream Android phone.

I will try to test it, I hope it compiles on the first try since my time is very limited. I think it could also be because the model is very heavy, and Android struggles with handling large data sizes. I suggest you try using q4_0 quantization.

zhouwg commented 3 months ago

Reviewing your comment again, I suggest you do not use the latest version of ggml (Anyway, there is no significant improvement in any aspect) as it introduces many changes that may end up breaking some of the code in sd.cpp. Use the one that is default in the master branch instead.

thanks

zhouwg commented 3 months ago

this personal AI study project is an Android turn-key project, you can reproduce this issue very easily on Xiaomi 14 or any other mainstream Android phone.

I will try to test it, I hope it compiles on the first try since my time is very limited. I think it could also be because the model is very heavy, and Android struggles with handling large data sizes. I suggest you try using q4_0 quantization.

Google's gemma model can run well on Xiaomi 14 using llama.cpp.

I'll try q4_0 quantization later. thanks so much.

leejet / stable-diffusion.cpp

doesn't work properly on Xiaomi 14 #220