I read the article "SesameBERT: Attention for Anywhere" and would like to add SENet blocks in the Huggingface implementation. The article's authors made an implementation with Tensorflow, but I would like to use the lib in pytorch.
Motivation
The use of (Squeeze-and-Excitation Networks) SENet Blocks has obtained state-of-the-art results. And they seem to be promising in NLP.
🚀 Feature Request
I read the article "SesameBERT: Attention for Anywhere" and would like to add SENet blocks in the Huggingface implementation. The article's authors made an implementation with Tensorflow, but I would like to use the lib in pytorch.
Motivation
The use of (Squeeze-and-Excitation Networks) SENet Blocks has obtained state-of-the-art results. And they seem to be promising in NLP.
Your contribution
I know that it is possible to modify the [BertLayer()] and [BertEnconder()] classes
Any suggestions on how to modify the code so that you can apply the idea used in the article?