Closed lintangsutawika closed 2 years ago
Thanks Lintang!
Weights on EncDecAttention and SelfAttention needs to be transposed.
How did you figure that they need to transpose since they share the same dimension?
I loaded both T5X version and HF version of a mT5-Large and compared the weight matrix layer by layer. I noticed that using np.array_equal(hf_weights, t5x_weights)
did not equal True
for SelfAttention and EncDecAttention, so I tried transposing them. Turns out it worked!
Ah neat! So hf_weights and t5x_weights are the same for all other layers?