Function parameters are in reverse order in biaffine class

zhangmeishan / BiaffineDParser

BiAffine Dependency Parsing

53 stars 16 forks source link

Function parameters are in reverse order in biaffine class #2

Open TimeLessLing opened 4 years ago

TimeLessLing commented 4 years ago

in the file Model.py, the forward function of class ParserModel, arc_logit = self.arc_biaffine(x_arc_dep, x_arc_head) and in the file Layer.py, the forward function of class Biaffine, biaffine = torch.transpose(torch.bmm(affine, input2), 1, 2) which means the final result is affine * input2 and the affine is calculated by input1, which is x_arc_dep, input2 is x_arc_head, but in the original paper, the formulation is s^(arc) = H^(arc_head) * U^(1)*H^(arc_dep) + H^(arc_head)*u^(2) it seems that the order of H(head) and H(dep) are in reverse in the code.

zhangmeishan commented 4 years ago

Thanks for your pointing out. However, that is no matter as they are symmetrical.

TimeLessLing commented 4 years ago

Thanks for your pointing out. However, that is no matter as they are symmetrical.

OK, and I also have a question about that formulation. The original formulation has two weight matrix U1 and U2, but in the code, I found you seemed combined U1 and U2 by adding a whole-one vevtor ones?

zhangmeishan commented 4 years ago

You can check the differences carefully to examine whether they are equivalent. Whatever, implementation details may be a little different, while the performance is the key. Do not sink into the trivials.

TimeLessLing commented 4 years ago

You can check the differences carefully to examine whether they are equivalent. Whatever, implementation details may be a little different, while the performance is the key. Do not sink into the trivials.

Thank you for your advance. I already know how these code correspond to the formulas. But now I have another problem，I find the sofrmax2d function in MST.py contains a process of subtracting the maximum value. y -= np.max(y, axis=1, keepdims=True) I don't know why should subtracting the maximum value. What is the special meaning of the maximum value?

zhangmeishan commented 4 years ago

Preventing value overflow. Assuming that [100000, 1000002], how to compute softmax in practice?