Closed pritishyuvraj closed 5 years ago
It has been a while since I implemented gated CNN last time. But if my memory works, gated CNN is nothing special but a replacement for ordinary activation functions. Say you have a vector x that you want to do linear transformation and apply activation functions on, instead of doing, say, ReLu(f(x)), where f is a linear transformation, you do two linear transformations f(x) and g(x), and you apply non-linear activation function on one of them, say ReLu(g(x)), finally you multiply f(x) and ReLu(g(x)) together. The dimension of f(x) and x could be different. But the size of f(x) and g(x) have to be the same in order to do element-wise multiplication.
Let me know if my understanding has become wrong.
My understanding of Gater Linear Unit isn't very concrete either. I am sharing its implementation in PyTorch: https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#glu
Although, I would rather vote for your solution to this problem, as it preserves dimensions even in Residual Layers (Unlike PyTorch GLU Implementation).
GLU(x): half = x //2 return x[0:half] * sigmoid[half:]
This is how I interpret GLU function.
It's an activation function which would reduce the dimensions too. Can you clarify your interpretation of GLU? And how have you avoided reducing the size of the dimensions of the tensor?