Support for Llama 70-B - Githubissues

rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models

https://docs.rs/llm/latest/llm/

Apache License 2.0

6.07k stars 355 forks source link

Support for Llama 70-B #409

Closed AmineDiro closed 1 year ago

AmineDiro commented 1 year ago

Hello, Solves #402 .

This is a temporary fix for supporting the Llama-2 70B model. I wanted to open a draft PR to get your feedbacks on this implementation for supporting the n_gqa params :

Added n_gqa as a optional param in ModelParameters
Added LlamaModelVersion enum akin to other e_model union in llama.cpp
Modified self-attention evaluation to use the n_head_kv for K and V instead of n_head

Here is the llama-2-70B--chat.ggmlv3.q4_0.bin model loaded on A100 GPU : Annotation 2023-08-17 203513

LLukas22 commented 1 year ago

Looks good, some small nitpicks but if the CI passes it should be good to go 👍

AmineDiro commented 1 year ago

@LLukas22 Thanks for the review 👍🏼 !

LLukas22 commented 1 year ago

Thanks for implementing this :D