duplicated bias in the forget gate of ChildSumTreeLSTM.lua

stanfordnlp / treelstm

Tree-structured Long Short-Term Memory networks (http://arxiv.org/abs/1503.00075)

GNU General Public License v2.0

877 stars 234 forks source link

duplicated bias in the forget gate of ChildSumTreeLSTM.lua #6

Closed supakjk closed 8 years ago

supakjk commented 8 years ago

Hello,

At the 40th and 41th lines of models/ChildSumTreeLSTM.lua, to compose the forget gate output, the outputs of both nn.TemporalConvolution and nn.Linear are summed. Since both of the modules have bias terms, additional bias term will be added differently to the formulation in the ACL2015 paper.

I wonder if I missed some part resolving the issue, it was intended, or a bug.

Thanks.

kaishengtai commented 8 years ago

As you point out, there is some redundant parameterization in the implementation due to these additional bias terms. This is intended in order to simplify the implementation (so that nn modules can be used without modification). At any rate, there aren't many bias parameters, so this detail shouldn't have a significant impact on performance.

supakjk commented 8 years ago

I see. Since recent nn.Linear code supports not using the bias term (by passing false as the 3rd argument), you can now evade the duplication without module modification.