Closed pengfeiwu1999 closed 1 year ago
The gist_offset parameter is needed because, when caching an instruction, the model needs to know the length of the cached instruction to apply the position embeddings correctly. (It's not needed for T5 due to T5's relative position embedding scheme). You can see how the position embeddings are shifted here:
apply_rotary_pos_emb
does not require the gist offset argument because it transforms the cos
and sin
tensors which already have the offset applied.
Note the offset parameter is not used during standard training or evaluation, because we don't actually modify any sequence lengths—the model gets the entire instruction/input/output in one go, with attention masking used to control compression, and the position embeddings are correctly applied with the original instruction length.
The offset parameter is only used in compress.py
—the GistActivations
class accepts a "gist offset" argument which records the length of the instruction before caching:
thanks for reply, but my question is: apply_rotary_pos_emb() function doesn't need the off-set para but in the code I mentioned above, in the 206 line of /gisting/src/gist_llama.py file , the function use off_set as input parameter,Doesn't that make an error?
The gist_offset parameter is needed because, when caching an instruction, the model needs to know the length of the cached instruction to apply the position embeddings correctly. (It's not needed for T5 due to T5's relative position embedding scheme). You can see how the position embeddings are shifted here:
apply_rotary_pos_emb
does not require the gist offset argument because it transforms thecos
andsin
tensors which already have the offset applied.Note the offset parameter is not used during standard training or evaluation, because we don't actually modify any sequence lengths—the model gets the entire instruction/input/output in one go, with attention masking used to control compression, and the position embeddings are correctly applied with the original instruction length.
The offset parameter is only used in
compress.py
—theGistActivations
class accepts a "gist offset" argument which records the length of the instruction before caching:
so when I run the llama model, it occurs "apply_rotary_pos_emb() got an unexpected keyword argument 'offset'"
The gist_offset parameter is needed because, when caching an instruction, the model needs to know the length of the cached instruction to apply the position embeddings correctly. (It's not needed for T5 due to T5's relative position embedding scheme). You can see how the position embeddings are shifted here:
apply_rotary_pos_emb
does not require the gist offset argument because it transforms thecos
andsin
tensors which already have the offset applied.Note the offset parameter is not used during standard training or evaluation, because we don't actually modify any sequence lengths—the model gets the entire instruction/input/output in one go, with attention masking used to control compression, and the position embeddings are correctly applied with the original instruction length.
The offset parameter is only used in
compress.py
—theGistActivations
class accepts a "gist offset" argument which records the length of the instruction before caching:
the error occurs in my llama training stage is
File "/data/wupf/gisting/src/gist_llama.py", line 206, in forward
query_states, key_states = apply_rotary_pos_emb(
TypeError: apply_rotary_pos_emb() got an unexpected keyword argument 'offset'
Are you using the version of transformers specified in requirements.txt
, specifically commit fb366b9a
? You'll see that the function signature is different:
(and more generally if you run into other issues it could be because you're not using the package versions specified in requirements.txt
)
Are you using the version of transformers specified in
requirements.txt
, specifically commitfb366b9a
? You'll see that the function signature is different:(and more generally if you run into other issues it could be because you're not using the package versions specified in
requirements.txt
)
I try to install the transformers version fb366b9a, but I can't install it on my server, All other packages are installed according to requirement except transformer
Unfortunately the codebase is only verified to work with commit fb366b9a
. You might be able to get around this specific issue by just pasting in the apply_rotary_pos_emb
function from the link above instead of importing it from modeling_llama
, but I can't guarantee you won't run into additional issues.
Unfortunately the codebase is only verified to work with commit
fb366b9a
. You might be able to get around this specific issue by just pasting in theapply_rotary_pos_emb
function from the link above instead of importing it frommodeling_llama
, but I can't guarantee you won't run into additional issues.
ok I fixed it, thanks!
Great!
https://github.com/jayelm/gisting/blob/acd78b49111db30c3a24c32f625b85ae59934585/src/gist_llama.py#L206
the apply_rotary_pos_emb() function does not accept the offset argument?
def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
The first two dimensions of cos and sin are always 1, so we can
squeeze
them.