Closed abheesht17 closed 1 month ago
@abheesht17 Thanks for opening this feature request!
The idea of the paper is definitely interesting! But at this moment I am not convinced that gMLP can be a good replacement to transformer. It claims that less parameters would be required, but we can also control the number of encoders or their SGUs? We will have more discussions over this next week, and you could also add this to your GSoC proposal if you want, thanks again!
Awesome! Thanks, @chenmoneygithub. Will add it to the doc :)
The gMLP model is from the paper "Pay Attention to MLPs". It has a decent number of citations - around 40. Every Encoder Block merely consists of linear layers, a "spatial gating unit", etc. Will be a good addition to the library, considering the research world is trying to find alternatives for self-attention, and because despite the simplicity of this model, it does achieve comparable performance with Transformers.