persimmon-ai-labs / adept-inference

Inference code for Persimmon-8B
https://www.adept.ai/
Apache License 2.0
416 stars 23 forks source link

Llama.cpp Support #6

Closed loretoparisi closed 10 months ago

loretoparisi commented 1 year ago

Exploring possibilities to support GGML / GGUF formats to run with Llama.cpp

VincentJGeisler commented 1 year ago

the model is missing some keys and count be converted to GGUF format

'rms_norm_eps'

maddes8cht commented 10 months ago

A full set of Llama.cpp compatible .gguf files is available at https://huggingface.co/maddes8cht/adept-persimmon-8b-base-gguf and https://huggingface.co/maddes8cht/adept-persimmon-8b-chat-gguf For the moment, cuda accelleration seems not to work, so you need to use -ngl 0 with the cublas versions.