Open cfunk1210 opened 6 years ago
Why are you using gated linear units (GLU) instead of the ReLU mentioned in the paper?
Also, why did you define your own GLU instead of using the built-in one? Was there a reason or was it just not implemented yet when you wrote this?
Thanks in advance.
Yes. It is because that GLU was not implemented when we tried to utilize it.
@hanzhanggit And what about the choice of GLU over ReLU?
Why are you using gated linear units (GLU) instead of the ReLU mentioned in the paper?
Also, why did you define your own GLU instead of using the built-in one? Was there a reason or was it just not implemented yet when you wrote this?
Thanks in advance.