sogou / SogouMRCToolkit

This toolkit was designed for the fast and efficient development of modern machine comprehension models, including both published models and original prototypes.
Apache License 2.0
746 stars 164 forks source link

fix Highway layer bug #8

Closed SunnyMarkLiu closed 5 years ago

SunnyMarkLiu commented 5 years ago

In the original paper of Highway Networks, the output of the highway layer is y=H(x,WH)·T(x,WT)+x·(1−T(x,WT)), and:

For highway layers, we use the transform gate defined as T (x) = σ(Wx + b), where WT is the weight matrix and bT the bias vector for the transform gates.

So, the activation of the transform gate is sigmoid, and the activation of the affine transform layer is a non-linear activation function, like relu. The highway code in this repo is just reversed. So I create a pull request to fixed it.

Thanks.

SunnyMarkLiu commented 5 years ago

@yukyang @wujindou Waiting for checking this pull request. Thanks!

yylun commented 5 years ago

thx for better naming