Closed AmineDiro closed 1 year ago
Hello, Solves #402 .
This is a temporary fix for supporting the Llama-2 70B model. I wanted to open a draft PR to get your feedbacks on this implementation for supporting the n_gqa params :
n_gqa
ModelParameters
LlamaModelVersion
n_head_kv
K
V
n_head
Here is the llama-2-70B--chat.ggmlv3.q4_0.bin model loaded on A100 GPU :
llama-2-70B--chat.ggmlv3.q4_0.bin
Looks good, some small nitpicks but if the CI passes it should be good to go 👍
@LLukas22 Thanks for the review 👍🏼 !
Thanks for implementing this :D
Hello, Solves #402 .
This is a temporary fix for supporting the Llama-2 70B model. I wanted to open a draft PR to get your feedbacks on this implementation for supporting the
n_gqa
params :n_gqa
as a optional param inModelParameters
LlamaModelVersion
enum akin to other e_model union in llama.cppn_head_kv
forK
andV
instead ofn_head
Here is the
llama-2-70B--chat.ggmlv3.q4_0.bin
model loaded on A100 GPU :