turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.19k stars 234 forks source link

Layer Skip looks interesting #431

Closed SinanAkkoyun closed 2 months ago

SinanAkkoyun commented 2 months ago

https://huggingface.co/papers/2404.16710

Hey! :)

I just found this and the self-speculative decoding looks promising at first glance

@turboderp What do you think about it?