lindermanlab / S5

MIT License
248 stars 43 forks source link

Hyena-S5 SSM has a strange activation setup #6

Closed ishitatsuyuki closed 11 months ago

ishitatsuyuki commented 1 year ago

While studying the difference between the experimental Hyena-S5 model (development branch) and H3, I've noticed that the S5SSM filter comes with an GeLU activation:

https://github.com/lindermanlab/S5/blob/008bd547890a17d6fce059f5de104c0d578b101b/configs/hyena_S5/wikitext_S5.yaml#L65

The filter finishes with this GeLU activation, which means the activated value gets passed straight to the inner product. This seems strange compared to other approaches:

Is it intentional that the GeLU output is passed straight to the inner product?

jimmysmith1919 commented 11 months ago

Hi, thanks for reaching out! This is more of an artifact of porting over some original S5 code, as opposed to a conscious design decision. It should work just as well with this gelu removed.