keras-team / keras-nlp

Modular Natural Language Processing workflows with Keras
Apache License 2.0
758 stars 227 forks source link

Add the gMLP Encoder Block #98

Closed abheesht17 closed 1 month ago

abheesht17 commented 2 years ago

The gMLP model is from the paper "Pay Attention to MLPs". It has a decent number of citations - around 40. Every Encoder Block merely consists of linear layers, a "spatial gating unit", etc. Will be a good addition to the library, considering the research world is trying to find alternatives for self-attention, and because despite the simplicity of this model, it does achieve comparable performance with Transformers.

chenmoneygithub commented 2 years ago

@abheesht17 Thanks for opening this feature request!

The idea of the paper is definitely interesting! But at this moment I am not convinced that gMLP can be a good replacement to transformer. It claims that less parameters would be required, but we can also control the number of encoders or their SGUs? We will have more discussions over this next week, and you could also add this to your GSoC proposal if you want, thanks again!

abheesht17 commented 2 years ago

Awesome! Thanks, @chenmoneygithub. Will add it to the doc :)