Custom RoPE Scaling - Githubissues

rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models

https://docs.rs/llm/latest/llm/

Apache License 2.0

6.07k stars 355 forks source link

Custom RoPE Scaling #389

Closed LLukas22 closed 1 year ago

LLukas22 commented 1 year ago

Closes https://github.com/rustformers/llm/issues/378.

Adds custom context scaling to llama, falcon, gpt-j, gpt-neox.

Adds an Option<ggml::CustomRoPEArguments> parameter to the ModelParameters.

Adds the optional --rope-base and --rope-scaling cli parameters.

LLukas22 commented 1 year ago

Sample command for 8k context of llama 2: cargo run --release --features cublas -- infer -a llama -m "C:\Users\lkreu\Downloads\llama-2-13b-chat.ggmlv3.q5_K_M.bin" -p "A llama riding a crab" --use-gpu --rope-scaling 0.5 --num-ctx-tokens 8192 --ignore-eos --stats
Sit back and get some coffee☕ (8192 tokens are a lot of tokens to be generated)

16k context is also possible by setting rope-scaling to 0.25 but then i don't have enough VRAM to infer on my GPU.

LLukas22 commented 1 year ago

The generated text gets repetitive after some time but i guess that's a smapler/setting issue. lama_story.txt

philpax commented 1 year ago

Great work! I just tested it with LLongMa-2; it's a bit finicky, but that shouldn't be a problem from us. I've revised the names a little to match llama.cpp / refer to frequency, but the rest is the same. Will merge once CI passes 🚀