srush / llama2.rs

A fast llama2 decoder in pure Rust.
MIT License
995 stars 54 forks source link

shrink the IN dim respect to the SIMD #36

Open huoyushequ opened 10 months ago

huoyushequ commented 10 months ago

the SIMD_8 is used in the method matvec of QLinear, so the input x with (B,IN) should transformed into [[Simd<f32, 8>; B]; IN/8]

huoyushequ commented 10 months ago

change from [[Simd<f32, 8>; B]; IN] to [[Simd<f32, 8>; B]; IN/8] can greatly reduce the stack usage and slightly enhance the reference speed.

srush commented 10 months ago

This is a good change, does it impact speed?

I was considering adding a rust gated feature that lets you do generic constant arithmetic, but this solution is probably better for now.

huoyushequ commented 10 months ago

i tried the `#![feature(generic_const_exprs)]

![allow(incomplete_features)]`

but it generates a bunch of warnings. On my desktop, the speed changed from achieved tok/s: 4.8163757 to achieved tok/s: 5.0361977

srush commented 9 months ago

sorry! I'll get this checked in. just made a bunch of other changes that I need to merge in.

huoyushequ commented 9 months ago

this proj is a great place to learn rust and llama and cuda(triton), very appreciated, hope to do something helpful to the proj

srush commented 9 months ago

Would love any contribution, I'm also learning Rust and Triton on the fly.

What if we try this library? It seems pretty cool. https://docs.rs/typenum/latest/typenum/

srush commented 9 months ago

Another idea would be to explore adding testing. Not sure how unit tests work in rust, but it would be nice to have these for small sizes.

huoyushequ commented 9 months ago

Would love any contribution, I'm also learning Rust and Triton on the fly.

What if we try this library? It seems pretty cool. https://docs.rs/typenum/latest/typenum/

import of the typenum would unnecessarily complicates the repo and make the code unintuitive just to bypass the limit of generic_const_expr

huoyushequ commented 9 months ago

Another idea would be to explore adding testing. Not sure how unit tests work in rust, but it would be nice to have these for small sizes.

I am happy to write some unit test after i carefully finish the reading source code