yongzx / multilingual-t0

Multilingual extension of T0
1 stars 3 forks source link

Update convert.py #4

Closed lintangsutawika closed 2 years ago

lintangsutawika commented 2 years ago
  1. Changed AutoModel to AutoModelForSeq2SeqLM so that the MT5ForConditionalGeneration is loaded.
  2. Mapped "target/decoder/logits_dense/kernel" to "lm_head"
  3. Weights on EncDecAttention and SelfAttention needs to be transposed.
yongzx commented 2 years ago

Thanks Lintang!

Weights on EncDecAttention and SelfAttention needs to be transposed.

How did you figure that they need to transpose since they share the same dimension?

lintangsutawika commented 2 years ago

I loaded both T5X version and HF version of a mT5-Large and compared the weight matrix layer by layer. I noticed that using np.array_equal(hf_weights, t5x_weights) did not equal True for SelfAttention and EncDecAttention, so I tried transposing them. Turns out it worked!

yongzx commented 2 years ago

Ah neat! So hf_weights and t5x_weights are the same for all other layers?