[suggestion] llama.cpp CLBlast support.

psugihara / FreeChat

llama.cpp based AI chat app for macOS

https://www.freechat.run

MIT License

425 stars 37 forks source link

[suggestion] llama.cpp CLBlast support. #37

Closed MikeLP closed 9 months ago

MikeLP commented 9 months ago

Is it possible to build llama.cpp (I believe it's your binary freechatserver) with CLBlast support? It supposed to work great on CPU and gives great acceleration for regular Macs!

In case you don't want to change anything, could you please provide instruction how to make/build the freechatserver binary to replace it with my version of llama.cpp

psugihara commented 9 months ago

I build llama.cpp with the LLAMA_NO_ACCELERATE=1 flag because otherwise apple rejects the app in review with a report of using private APIs. Is that what turns on CLBlast support?

This issue has more details: https://github.com/ggerganov/llama.cpp/issues/3438

If you want to try building locally without that flag, you can pull llama.cpp, run make, then replace freechat/mac/FreeChat/Models/NPC/freechat-server with the produced server binary (I turn that into a universal x86/arm64 binary with lipo but you can just rename server to freechat-server if you just want the one architecture to work). You will also need to copy ggml-metal.metal to the same directory.

Let me know if you see perf gains. For me I didn't see any changes in time to response or tokens/second with the LLAMA_NO_ACCELERATE flag (I'm on m1 pro with 64GB RAM).

sussyboiiii commented 9 months ago

I build llama.cpp with the LLAMA_NO_ACCELERATE=1 flag because otherwise apple rejects the app in review with a report of using private APIs.

I'm not experienced with this but couldn't you just release the app via github and the appstore so you gan get updates earlier on github and use features apple doesn't want?

psugihara commented 9 months ago

Technically I could, but I ideally don't want to support 2 versions of the app and I did not see any performance improvements without that flag. But please let me know if your experience differs and I'll re-asses the trade-off. There are good reasons for them not to allow access to private APIs (OS patches could break the app).

sussyboiiii commented 9 months ago

Fair enough.

MikeLP commented 9 months ago

@psugihara I appreciate your response, I will let you know if it works. I just was concerned that the size of the server and the size of the binary I've built are pretty different.

psugihara commented 9 months ago

yep, should be about half the size of you're just building for one architecture. The lipo command I use just glues 2 together.