Closed jamie-murdoch closed 7 years ago
Hi Jamie,
For the leaf module, the implementation gives h = o * tanh(Wx)
, whereas the definition in the paper gives h = o * tanh(i * tanh(Wx))
. This is a simplification that is omitted from the paper. The reasoning is the following: since o
already gates the output, i
appears to be redundant and is therefore fixed to 1. Now tanh(tanh(z))
is approximately tanh(z)
, which gives the expression used in the implementation. In practice, I don't think that the delta here should make much of a difference.
Hope that helps.
Hi!
Could you point to where the leaf module implemented in the BinaryTreeLSTM object is described in the paper? I can't seem to find it, but it seems like a non-trivial part of the model.