Fix/data: handle models with specific `num_key_value_heads`

huggingface / optimum-amd

AMD related optimizations for transformer models

https://huggingface.co/docs/optimum/amd/index

MIT License

46 stars 17 forks source link

Closed nickfraser closed 3 months ago

nickfraser commented 3 months ago

For some models, num_attention_heads != num_key_value_heads. This fixes the dimension of the past_key_values in this case.