Encoder - Githubissues

Standard "-former" structure is: Skip(Norm->Attn/S5)->Skip(Norm->FFN/GLU), in this case FFN is the GLU variant. The paper doesn't specify any structure for the final model, but the reference code has the similar structure: https://github.com/lindermanlab/S5/blob/3c18fdb6b06414da35e77b94b9cd855f6a95ef17/s5/layers.py#L63-L90

Although they use Skip(Norm->S5->FFN/GLU)

In case you are confused about ff_enc naming, this is because it's actually a 2-layer mlp to fan-in and fan-out the GLU. so you encode into the ff dim, do the GLU and the decode back to the common hidden dim.

i404788 / s5-pytorch

Encoder #4