srush / llama2.rs

A fast llama2 decoder in pure Rust.
MIT License
1.01k stars 56 forks source link

Configure model size constants using cfg attrs #12

Closed rachtsingh closed 1 year ago

rachtsingh commented 1 year ago

Hey Sasha,

I was trying out llama2.rs and wanted to swap between the 7B/13B versions on the fly, and I think using conditional compilation here makes it a bit easier.

I also spent some time trying to optimize the SIMD but it seems really fast! I couldn't find any easy optimizations.

srush commented 1 year ago

oh this is great! I had no idea it worked this way.

srush commented 1 year ago

Maybe I will let it switch between quantized / non-quantized this way too.