Closed richelbilderbeek closed 3 years ago
Thanks for the suggestion!
The layers and their arguments (including activation function) are specified in the model definition file (e.g. models/M1.json). Since Swish is implemented in tensorflow, it is easy to switch to this for any user who would be interested to try it out, by specifying it in the layer arguments instead of elu.
It's an interesting idea to try it out, we might experiment with it. But we will keep the example model in this respository the one that corresponds to the one in the paper, and leave it to users to try different models and training schemes.
GCAE uses the exponential linear unit ('ELU') as an activation function. In [1] it is claimed that 'the Swish activation function would be better in all cases [over ELU]'.
I am unsure if you think it would be worth to try out Swish? The improvements in accuracy as shown in [1] are only minor.