turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.28k stars 243 forks source link

Any examples on long inputs on rope scaled model? #248

Closed sreeprasannar closed 1 month ago

sreeprasannar commented 6 months ago

Apart from ExLlamaV2Config, is there any change in function calls to enable rope scaling? I tried a factor of 2.0 but there's a RuntimeError when input tokens exceed max_seq_len in config - huggingface handles rope scaling without changing the max_seq_len, but for this framework, should I manually set the max_seq_len?

sreeprasannar commented 6 months ago

This line in exllamav2 sets the torch tensor to a max length of model config's max_seq_len while the PR that originally added this feature to huggingface seems to use the seq_len of the input for the same tensor -- (this call specifically handles the case where input is larger than config's max_seq_len)