Thanks so much for such a great code base. I noticed that the implementation of sca-cnn by the original authors applied doubly stochastic regularization to the attention context vectors.
I wasn't able to find this in your code, but I wasn't sure if it was an error on my part.
Thanks for reminding me. Sorry that I haven't implemented the regularization as I was only trying to compare different attention mechanisms. But I will keep improving the code to fully implement SCA-CNN.
Hi!
Thanks so much for such a great code base. I noticed that the implementation of sca-cnn by the original authors applied doubly stochastic regularization to the attention context vectors.
I wasn't able to find this in your code, but I wasn't sure if it was an error on my part.