srush / llama2.rs

A fast llama2 decoder in pure Rust.
MIT License
995 stars 54 forks source link

Some llama2 finetunes don't seem to work #22

Open balisujohn opened 10 months ago

balisujohn commented 10 months ago

I got https://huggingface.co/TheBloke/Llama-2-13B-GPTQ to work, but using exactly the same strategy for https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ, I get the following error:

RUST_BACKTRACE=1 target/release/llama2_rs -c l13b.act64.bin -t 0.0 -s 25 -p "Hello to all the cool people out there who "  --debug
Configuration: Config { dim: 5120, hidden_dim: 13824, n_layers: 40, n_heads: 40, n_kv_heads: 40, vocab_size: 32000, seq_len: 2048, shared_weight: false }
thread 'main' panicked at src/main.rs:106:9:
assertion `left == right` failed
  left: 8556630020
 right: 8556548100

The offset seems to always be 81920, which is 40*2048 which are both in the constants.rs file for the 13b models, so maybe that's relevant.

balisujohn commented 10 months ago

(intuition tells me it's because this is a LoRA finetune)

srush commented 10 months ago

Oh weird, for some reason they added 2 additional word tokens. 2 5120 2 * 4 bytes

I'll take them out for now, and think about a way to handle it better.