Closed CSerxy closed 2 years ago
Hi To my experience, initialization matters, i initialized compacter's weight for all values you mentioned in all experiments with:
"phm_c_init": ["normal"],
"phm_init_range": [0.0001]
here are the lines this is done:
thanks.
Hi rabeehk,
Thanks for your quick and detailed response!!
Can I understand that you initialize W with normal(mean=0, std=0.0001). To establish a similar matrix W, when you decompose W to the sum of A_i and B_i multiplication, you initialize each A_i and B_i with normal(mean=0, std=0.01)?
Many thanks!
Hi No really, I initialized them all with normal(mean=0, std=0.0001), perhaps the way you mentioned makes more sense, but I did it this way when running the experiments.
Gotcha, thanks for the answer!!
I wonder what kind of initialization you did for decomposing the W matrix. I am also curious whether different initialization for the weight matrics matters?
Assuming W initialization follows a normal distribution with mean = a, std=b, how did you initialize $A_i$ and $B_i$? Besides, if you further decompose the $B_i$ to $s_i$ and $t_i$, how did you initialize them?
Many thanks!