While studying the difference between the experimental Hyena-S5 model (development branch) and H3, I've noticed that the S5SSM filter comes with an GeLU activation:
The filter finishes with this GeLU activation, which means the activated value gets passed straight to the inner product. This seems strange compared to other approaches:
H3 has no activation within the filter.
Hyena uses a MLP with its depth configured by num_inner_mlps (default 2).
Is it intentional that the GeLU output is passed straight to the inner product?
Hi, thanks for reaching out! This is more of an artifact of porting over some original S5 code, as opposed to a conscious design decision. It should work just as well with this gelu removed.
While studying the difference between the experimental Hyena-S5 model (development branch) and H3, I've noticed that the S5SSM filter comes with an GeLU activation:
https://github.com/lindermanlab/S5/blob/008bd547890a17d6fce059f5de104c0d578b101b/configs/hyena_S5/wikitext_S5.yaml#L65
The filter finishes with this GeLU activation, which means the activated value gets passed straight to the inner product. This seems strange compared to other approaches:
num_inner_mlps
(default 2).Is it intentional that the GeLU output is passed straight to the inner product?