leejet / stable-diffusion.cpp

Stable Diffusion in pure C/C++
MIT License
2.91k stars 233 forks source link

slow ggml_vec_dot_f16 operator on Android #190

Open Jimskns opened 4 months ago

Jimskns commented 4 months ago

Hi, @leejet I compiled this project with clblast support and run sd on my Android phone. It runs successfully, however it's quite slow, about 70s per iter. And I profile it with perf, convert the output to flame graph and I found that the ggml_vec_dot_f16 accounts for over 80% of the runtime. Does this op support the adreno gpu acceleration? What's the reason behind this? SD-perf

Thanks a lot~

FSSRepo commented 4 months ago

I think it would be better to support Vulkan backend for acceleration on Android devices, as ggml currently lacks good support for OpenCL (it is even considered obsolete). Unfortunately, I don't know much about Vulkan to implement the kernels of the operations (I started watching some videos a few weeks ago because I want to stop using OpenGL).