turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.54k stars 274 forks source link

Added draft model rope scale to chat example #204

Closed SinanAkkoyun closed 10 months ago

SinanAkkoyun commented 10 months ago

When setting a custom rope scale, now the chat.py example also sets the same rope scale for the draft model (needed for deepseek drafting)