rabeehk / compacter

127 stars 15 forks source link

Initialization questions #2

Closed CSerxy closed 2 years ago

CSerxy commented 2 years ago

I wonder what kind of initialization you did for decomposing the W matrix. I am also curious whether different initialization for the weight matrics matters?

Assuming W initialization follows a normal distribution with mean = a, std=b, how did you initialize $A_i$ and $B_i$? Besides, if you further decompose the $B_i$ to $s_i$ and $t_i$, how did you initialize them?

Many thanks!

rabeehk commented 2 years ago

Hi To my experience, initialization matters, i initialized compacter's weight for all values you mentioned in all experiments with:

"phm_c_init": ["normal"],
"phm_init_range": [0.0001]

here are the lines this is done:

https://github.com/rabeehk/compacter/blob/b210eef13f64ff6441186ee5a1cbf031b5918b94/seq2seq/hypercomplex/layers.py#L147

https://github.com/rabeehk/compacter/blob/b210eef13f64ff6441186ee5a1cbf031b5918b94/seq2seq/hypercomplex/layers.py#L117

https://github.com/rabeehk/compacter/blob/b210eef13f64ff6441186ee5a1cbf031b5918b94/seq2seq/third_party/models/t5/modeling_t5.py#L1644

thanks.

CSerxy commented 2 years ago

Hi rabeehk,

Thanks for your quick and detailed response!!

Can I understand that you initialize W with normal(mean=0, std=0.0001). To establish a similar matrix W, when you decompose W to the sum of A_i and B_i multiplication, you initialize each A_i and B_i with normal(mean=0, std=0.01)?

Many thanks!

rabeehk commented 2 years ago

Hi No really, I initialized them all with normal(mean=0, std=0.0001), perhaps the way you mentioned makes more sense, but I did it this way when running the experiments.

CSerxy commented 2 years ago

Gotcha, thanks for the answer!!