lucidrains / g-mlp-pytorch

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch
MIT License
417 stars 58 forks source link

Parameter count doesnt line up with paper #4

Closed titu1994 closed 3 years ago

titu1994 commented 3 years ago

Just a note (and correct me if I misunderstood the paper) -

The parameter count for the Tiny gMLP doesnt line up with the param count from the paper for 30 layers and 128 dim and 6 ff_mult. Thats probably due to the doubling of parameters here - https://github.com/lucidrains/g-mlp-pytorch/blob/main/g_mlp_pytorch/g_mlp_pytorch.py#L111

Halving this back to dim_ff + all 3 lines here need to halve their respective dims - https://github.com/lucidrains/g-mlp-pytorch/blob/main/g_mlp_pytorch/g_mlp_pytorch.py#L64-L66

Then param count is roughly 5.5 M params.

lucidrains commented 3 years ago

@titu1994 Hi Somshubra! I made the changes in 0.0.9 - could you let me know if it matches up now?

titu1994 commented 3 years ago

Param counts are now very close to the paper, thanks !