mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
https://arxiv.org/abs/2309.17453
MIT License
6.59k stars 361 forks source link

Error happened #72

Open ForrestPi opened 9 months ago

ForrestPi commented 9 months ago

when CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py --enable_streaming

File "/data0/XXX/nlp/streaming-llm/streaming_llm/pos_shift/modify_llama.py", line 87, in llama_pos_shift_attention_forward kv_seq_len += past_key_value[0].shape[-2] File "/home/XXX/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/cache_utils.py", line 78, in getitem raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}") KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'

gjm441 commented 8 months ago

same error. Have you solve it?

gjm441 commented 8 months ago

same error. Have you solve it?

I have mad it. Use transformers=4.33.0.