guinmoon / LLMFarm

llama and other large language models on iOS and MacOS offline using GGML library.
https://llmfarm.site
MIT License
1.05k stars 62 forks source link

Add in-app option support for flash attention #76

Closed DKNTZMN closed 1 week ago

DKNTZMN commented 2 weeks ago

Please add a support of in-app flash attention option for the model. Current model is spitting nonsense while running without flash attention. Thank you.

MiniPhantom commented 1 week ago

You can enable it in the json chat settings file I’m pretty sure.