Open Minami-su opened 7 months ago
And then I try this:pip install git+https://github.com/tomaarsen/attention_sinks.git@model/qwen_fa error happen:
The repository for Qwen-7B-Chat2 contains custom code which must be executed to correctlyload the model. You can inspect the repository content at https://hf.co/Qwen-7B-Chat2.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
Do you wish to run the custom code? [y/N] y
The model is automatically converting to fp16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 8/8 [00:07<00:00, 1.06it/s]
[Attention Sinks] Injected Position Shifting into 32 attention classes.
[Attention Sinks] Injected Attention Sink KV Cache into 1 model class.
Vaswani et al. (2017) introduced the Traceback (most recent call last):
File "/home/luhao/test.py", line 36, in <module>
generated_tokens = model.generate(
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat2/modeling_qwen.py", line 1261, in generate
return super().generate(
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/transformers/generation/utils.py", line 1623, in generate
return self.contrastive_search(
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/transformers/generation/utils.py", line 2007, in contrastive_search
outputs = self(
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat2/modeling_qwen.py", line 1045, in forward
transformer_outputs = self.transformer(
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/attention_sinks/inject_mixin.py", line 131, in wrapped_forward
outputs = old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat2/modeling_qwen.py", line 893, in forward
outputs = block(
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat2/modeling_qwen.py", line 612, in forward
attn_outputs = self.attn(
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/attention_sinks/models/qwen/pos_shift.py", line 217, in qwen_pos_shift_attention_forward
causal_mask = registered_causal_mask[:, :, key.size(-2) - query.size(-2) : key.size(-2), : key.size(-2)]
TypeError: 'NoneType' object is not subscriptable
error: