Open qinhui99 opened 6 years ago
Thats not how you use any RNN. Number of units is not generally the batch size, recurrent activation is not supposed to be changed from sigmoid as it changes the meaning of the operations.
As to stacking them, its a specific set of instructions really. But that may not improve performance as much.
In normal condition, we really can not change the recurrent_activation with 'elu'. But I tested your SRU, it worked in 'elu'. And I got a better performance than 'sigmoid'. Besides, I tested SRU with 'elu' in cpu , it's speed was faster one time than GRU in cpu. You SRU is very good. :) +1:
I tried to use SRU like these: SRU(batch_size, dropout=0., recurrent_dropout=0., unroll=True,implementation=1,recurrent_activation='elu')
It worked and got some improvement. But still can not get the same score as GRU. So I want to try use stack SRU. Can you give me some suggestions?