srush / llama2.rs

A fast llama2 decoder in pure Rust.
MIT License
1.01k stars 56 forks source link

nice work, some questions #1

Open lucasjinreal opened 1 year ago

lucasjinreal commented 1 year ago

does it using any mat calculation accelerate framework from rust's lib? any plan to make it further, for instance, make it like ggml popular

srush commented 1 year ago

It's using Rayon for data parallel matrix vector mult, but no other libraries.

See the rust library Candle which has a full implementation with matrix mults.

Was thinking I would try implementing GGML-style quantization. Any other features you would want?

lucasjinreal commented 1 year ago

Yes, would like ask some more questions:

  1. will it support arm? Further more, arm with fp16? If so, mac with M1 or M2 can runs very happy with that;
  2. I think rust can get a very impressive ecology compare with c or c++ with it's extensively libs like termUI or tauri etc, can be useful to build top level chat UI without pain seemlessly integrate with infer core in rust. Will consider make it as some infer core so that people can build their own UIs based on this?
srush commented 1 year ago
  1. Should work with fine with arm. But currently it is f32 only. (Note though this is CPU no gpu support) Have to think about how to add f16.
  2. I'm pretty new to rust, but I think people will come up with lots of ways to use it!