srush / llama2.rs

A fast llama2 decoder in pure Rust.
MIT License
995 stars 54 forks source link

Non-mmap'ed weights #34

Closed srush closed 10 months ago

srush commented 10 months ago

@rachtsingh

For the Cuda code, I convert the model to a different form let's call it QCudaWeights. However, since it is not mmap'ed the it now has an owner. However I'm not sure how to create a LLamaModel from this.

pub struct LlamaModel {
    mmap: Mmap,
    pub config: Config,
    pub weights: &'static TWeights,
}

Any ideas for how to have a LlamaModel that can handle either owned or non-owned weights? Mmap is pretty interesting as a concept that breaks borrowing in Rust.

rachtsingh commented 10 months ago

I think maybe we use a Trait to define the interface for a “LlamaModel” and change this to DiskLlamaModel which implements the interface? I think maybe will depend on how the QCudaWeights is set up (the trait might need a lifetime parameter, I think).

Not really much of a Rust programmer though so I would definitely take it all with a grain of salt.

On Thu, Aug 31, 2023 at 1:58 PM Sasha Rush @.***> wrote:

@rachtsingh https://github.com/rachtsingh

For the Cuda code, I convert the model to a different form let's call it QCudaWeights. However, since it is not mmap'ed the it now has an owner. However I'm not sure how to create a LLamaModel from this.

pub struct LlamaModel { mmap: Mmap, pub config: Config, pub weights: &'static TWeights,}

Any ideas for how to have a LlamaModel that can handle either owned or non-owned weights? Mmap is pretty interesting as a concept that breaks borrowing in Rust.

— Reply to this email directly, view it on GitHub https://github.com/srush/llama2.rs/issues/34, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIJ3AHWGL7BWP7LMQIOKDXYDGCPANCNFSM6AAAAAA4GOBPR4 . You are receiving this because you were mentioned.Message ID: @.***>

srush commented 10 months ago

oh you know what I'll just compile time gate it. Kind of lame, but I don't ever need both.