First, i cannot figure out what is the real meaning of h_w1 and h_w2 in the paper. author never gives a description of them based on the below equation.
then, I start to search them in code, and I find they are totally same. please correct me if I'm wrong. the calculation of S(w1, r, w2) uses the below code.
however, both trs0_rel and trs1_rel are linear functions with same dimension. same input are passed to them and i think the output should be also same.
Hi,
First, i cannot figure out what is the real meaning of h_w1 and h_w2 in the paper. author never gives a description of them based on the below equation.
then, I start to search them in code, and I find they are totally same. please correct me if I'm wrong. the calculation of S(w1, r, w2) uses the below code.
however, both trs0_rel and trs1_rel are linear functions with same dimension. same input are passed to them and i think the output should be also same.
self.trs0_rel = nn.Linear(self.hid_size2, self.hid_size) self.trs1_rel = nn.Linear(self.hid_size2, self.hid_size)
so my question is, if one linear function is enough, why two are listed with different notations?
any explanations are greatly appreciated. thank you