Open TimeLessLing opened 4 years ago
Thanks for your pointing out. However, that is no matter as they are symmetrical.
Thanks for your pointing out. However, that is no matter as they are symmetrical.
OK, and I also have a question about that formulation. The original formulation has two weight matrix U1 and U2, but in the code, I found you seemed combined U1 and U2 by adding a whole-one vevtor ones
?
You can check the differences carefully to examine whether they are equivalent. Whatever, implementation details may be a little different, while the performance is the key. Do not sink into the trivials.
You can check the differences carefully to examine whether they are equivalent. Whatever, implementation details may be a little different, while the performance is the key. Do not sink into the trivials.
Thank you for your advance. I already know how these code correspond to the formulas. But now I have another problem,I find the sofrmax2d function in MST.py contains a process of subtracting the maximum value.
y -= np.max(y, axis=1, keepdims=True)
I don't know why should subtracting the maximum value. What is the special meaning of the maximum value?
Preventing value overflow. Assuming that [100000, 1000002], how to compute softmax in practice?
in the file Model.py, the forward function of class ParserModel,
arc_logit = self.arc_biaffine(x_arc_dep, x_arc_head)
and in the file Layer.py, the forward function of class Biaffine,biaffine = torch.transpose(torch.bmm(affine, input2), 1, 2)
which means the final result isaffine * input2
and the affine is calculated by input1, which is x_arc_dep, input2 is x_arc_head, but in the original paper, the formulation iss^(arc) = H^(arc_head) * U^(1)*H^(arc_dep) + H^(arc_head)*u^(2)
it seems that the order of H(head) and H(dep) are in reverse in the code.