leejet / stable-diffusion.cpp

Stable Diffusion and Flux in pure C/C++
MIT License
3.54k stars 306 forks source link

ggml-metal.m - GGML_ASSERT: ne00 % 4 == 0 when generating images of dimensions 640x640 #193

Open phudtran opened 8 months ago

phudtran commented 8 months ago

Seems to be an issue with group_norm on metal, haven't tried with other backends.

./bin/sd -m models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors -p "a cat" --steps 2 -H 640 -W 640
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Max
ggml_metal_init: picking default device: Apple M2 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading 'stable-diffusion.cpp/build/bin/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M2 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
[INFO ] stable-diffusion.cpp:142  - loading model from 'models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors'
[INFO ] model.cpp:676  - load models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:164  - Stable Diffusion 1.x
[INFO ] stable-diffusion.cpp:170  - Stable Diffusion weight type: f32
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   469.45 MiB, (  471.33 / 21845.34)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =  2155.34 MiB, ( 2626.67 / 21845.34)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    94.47 MiB, ( 2721.14 / 21845.34)
[INFO ] stable-diffusion.cpp:306  - total params memory size = 1408.32MB (clip 469.44MB, unet 2155.33MB, vae 94.47MB, controlnet 0.00MB)
[INFO ] stable-diffusion.cpp:310  - loading model from 'models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors' completed, taking 1.03s
[INFO ] stable-diffusion.cpp:327  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:1374 - apply_loras completed, taking 0.00s
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     1.41 MiB, ( 2722.55 / 21845.34)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     1.41 MiB, ( 2722.55 / 21845.34)
[INFO ] stable-diffusion.cpp:1413 - get_learned_condition completed, taking 69 ms
[INFO ] stable-diffusion.cpp:1429 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1433 - generating image: 1/1 - seed 42
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =  1320.61 MiB, ( 3572.30 / 21845.34)
GGML_ASSERT: stable-diffusion.cpp/ggml/src/ggml-metal.m:2034: ne00 % 4 == 0
GGML_ASSERT: stable-diffusion.cpp/ggml/src/ggml-metal.m:2034: ne00 % 4 == 0
zsh: abort      ./bin/sd -m  -p "a cat" --steps 2 -H 640 -W 640
phudtran commented 8 months ago

Also asserts for all square image dimensions except for 512x512, 768x768, and 1024x1024 ( haven't tested past 1024x1024).

leejet commented 8 months ago

This seems to be an issue with the implementation of the ggml Metal backend. You can try removing the corresponding assets to see if the issue persists.

smasyutin commented 7 months ago

Same assert for me, but on v2-1_768-ema-pruned.safetensors model on M1 Pro. No image dimensions work.

Any suggestions on workaround/fix?

remixer-dec commented 6 months ago

Same issue at all dimensions, after commenting out the assert line in ggml-metal.m it just works with no noticeable difference.