Added draft model rope scale to chat example

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.54k stars 274 forks source link

Closed SinanAkkoyun closed 10 months ago

SinanAkkoyun commented 10 months ago

When setting a custom rope scale, now the chat.py example also sets the same rope scale for the draft model (needed for deepseek drafting)