Open olifei opened 9 months ago
原因:Llama-2-70B transformer结构num_heads与num_key_value_heads不相等,会导致在forward阶段计算attention时维度出现错误。参考transformers进行修改: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L263
原因:Llama-2-70B transformer结构num_heads与num_key_value_heads不相等,会导致在forward阶段计算attention时维度出现错误。参考transformers进行修改: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L263