Open creatorrr opened 6 months ago
is this code run ok?
@alphanlp no, just pseudocode of their algorithm
+1
As the paper mentioned, self-Extend do not support flash-attn.
As the paper mentioned, self-Extend do not support flash-attn.
We recently added flash-attention support for Selfextend
+1 hope to see adding selfextend in vllm.
@zhuohan123 thoughts? Pointers to how I can contribute will be awesome :)
In the paper LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, the authors describe a method to extend the context-window of any rope-based model without fine-tuning at inference time. I haven't gotten around to testing it myself but the results reported in the paper seem game-changing.
How could we add support for this in vllm? Their algorithm: