zepingyu0512 / srnn

sliced-rnn
470 stars 104 forks source link

weight sharing question #10

Closed KaitoHH closed 6 years ago

KaitoHH commented 6 years ago

I have read your paper and you have mentioned that SRNN could be equal to standard RNN if you set their initial parameters. However, in your code, you did not use any tricks of weight sharing, right? Then why you considered using activation=None in GRU unit? Would it be better if you simply use default activation function? Thanks.

zepingyu0512 commented 6 years ago

Yes, you are right. The proof in the paper is to show why SRNN could work. In the paper we have proved that SRNN is equal to RNN when we use linear recurrent activation function, but when we do the expriments, both SRNN's and RNN's recurrent units are non-linear activation functions because non-linear functions are stronger than linear functions. The results show that SRNN with non-linear function also gets better performance than RNN. The "activation=None" in keras means adding no activation function after the recurrent layer. The default one is "tanh", it will get similar results if using tanh. The keras documentation has more details: https://keras.io/layers/recurrent/#gru

KaitoHH commented 6 years ago

Thanks for the reply, may you succeed in your future research!