i404788 / s5-pytorch

Pytorch implementation of Simplified Structured State-Spaces for Sequence Modeling (S5)
Mozilla Public License 2.0
59 stars 3 forks source link

Encoder #4

Closed yhl48 closed 6 months ago

yhl48 commented 6 months ago

Thanks for the good work. Doesn't the encoder come before the S5 layer, or am I missing something here?

https://github.com/i404788/s5-pytorch/blob/5aad8a13c8c7c76e34b8786258e59a2f88e52936/s5/s5_model.py#L352

i404788 commented 6 months ago

Standard "-former" structure is: Skip(Norm->Attn/S5)->Skip(Norm->FFN/GLU), in this case FFN is the GLU variant. The paper doesn't specify any structure for the final model, but the reference code has the similar structure: https://github.com/lindermanlab/S5/blob/3c18fdb6b06414da35e77b94b9cd855f6a95ef17/s5/layers.py#L63-L90

Although they use Skip(Norm->S5->FFN/GLU)

In case you are confused about ff_enc naming, this is because it's actually a 2-layer mlp to fan-in and fan-out the GLU. so you encode into the ff dim, do the GLU and the decode back to the common hidden dim.