when CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py --enable_streaming
File "/data0/XXX/nlp/streaming-llm/streaming_llm/pos_shift/modify_llama.py", line 87, in llama_pos_shift_attention_forward
kv_seq_len += past_key_value[0].shape[-2]
File "/home/XXX/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/cache_utils.py", line 78, in getitem
raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}")
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
when CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py --enable_streaming
File "/data0/XXX/nlp/streaming-llm/streaming_llm/pos_shift/modify_llama.py", line 87, in llama_pos_shift_attention_forward kv_seq_len += past_key_value[0].shape[-2] File "/home/XXX/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/cache_utils.py", line 78, in getitem raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}") KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'