Closed sreeprasannar closed 1 month ago
This line in exllamav2 sets the torch tensor to a max length of model config's max_seq_len
while the PR that originally added this feature to huggingface seems to use the seq_len of the input for the same tensor -- (this call specifically handles the case where input is larger than config's max_seq_len
)
Apart from
ExLlamaV2Config
, is there any change in function calls to enable rope scaling? I tried a factor of 2.0 but there's a RuntimeError when input tokens exceedmax_seq_len
in config - huggingface handles rope scaling without changing themax_seq_len
, but for this framework, should I manually set themax_seq_len
?