Closed fahadh4ilyas closed 11 months ago
Yes, the sin
and cos
tensors are precomputed in ExLlama.__init__()
, with the scale given by config.compress_pos_emb
. If you want to use multiple scales you'd have to either modify the CUDA functions that apply the embeddings or create multiple versions of those tensors, e.g. at load time.
But keep in mind that keys and values computed cached for one scale will be invalid for any other, so if you're hoping to use one scale up to 2048 tokens and the switch to another as the generation grows longer, this won't work. You'll have to drop the cache at that point and run inference on the sequence-so-far to build the cache for the new scale.
Moreover, the scale you should use is the one the model is finetuned on. So it's doubtful that you'll get good results with this approach in any case.
Yes, the
sin
andcos
tensors are precomputed inExLlama.__init__()
, with the scale given byconfig.compress_pos_emb
. If you want to use multiple scales you'd have to either modify the CUDA functions that apply the embeddings or create multiple versions of those tensors, e.g. at load time.But keep in mind that keys and values computed cached for one scale will be invalid for any other, so if you're hoping to use one scale up to 2048 tokens and the switch to another as the generation grows longer, this won't work. You'll have to drop the cache at that point and run inference on the sequence-so-far to build the cache for the new scale.
Moreover, the scale you should use is the one the model is finetuned on. So it's doubtful that you'll get good results with this approach in any case.
Yeah, you are right. I thought we could just use scaling without fine tuning. But the result is not good even though the perplexity value is good and decreasing.
So, I'm trying to make the value of sin and cos changed based on the length of sequence. I found that the value of sin and cos is:
So, for model with
compress_pos_emb
value2
, I'm trying to add a step parameter which is when:input_ids.shape[-1] <= 2048
, I setstep=2
.2048 < input_ids.shape[-1] <= 4096
, I setstep=1
.so that I could do this:
But, computing perplexity showing that there is nothing changed. Is there any missing step here?
EDIT: I realize that RoPE computation is not in python and it seems that it only get the pointer of the sin and cos. That's why making
self.sin[:,:,::step,:]
is not used because the pointer is still referencing the whole tensor. My current way to handle it is to make multiple sin and cos based on step at the cost of VRAM.