Closed nickfraser closed 3 months ago
For some models, num_attention_heads != num_key_value_heads. This fixes the dimension of the past_key_values in this case.
num_attention_heads != num_key_value_heads
past_key_values
For some models,
num_attention_heads != num_key_value_heads
. This fixes the dimension of thepast_key_values
in this case.