Closed magician-blue closed 12 months ago
In run.c, I find something like this.
// qkv matmuls for this position matmul(s->q, s->xb, w->wq + l*dim*dim, dim, dim); matmul(s->k, s->xb, w->wk + l*dim*kv_dim, dim, kv_dim); matmul(s->v, s->xb, w->wv + l*dim*kv_dim, dim, kv_dim);
Thanks for you question. I found this issue in the original llama repo quite interesting . Probably need to revise & adapt our implementation. @magician-blue did you implement similar approach in you recent PR?
I remember llama2 uses group query attention. In the llama.c, I found there are kv_heads, kv_dim.