titu1994 / keras-SRU

Implementation of Simple Recurrent Unit in Keras
89 stars 32 forks source link

SRU is faster,but lower performance in accuracy score #7

Open qinhui99 opened 6 years ago

qinhui99 commented 6 years ago

I tested SRU comparing with GRU and LSTM in imdb dataset. SRU was the fastest one ,but SRU got the lowest score in accuray. My log is here:

SRU: 10s - loss: 0.6226 - acc: 0.6436 - val_loss: 0.5841 - val_acc: 0.6807 Epoch 2/5 6s - loss: 0.4984 - acc: 0.7571 - val_loss: 0.5790 - val_acc: 0.7018 Epoch 3/5 6s - loss: 0.3955 - acc: 0.8204 - val_loss: 0.6177 - val_acc: 0.7202 Epoch 4/5 6s - loss: 0.3052 - acc: 0.8668 - val_loss: 0.6947 - val_acc: 0.7243 Epoch 5/5 6s - loss: 0.2293 - acc: 0.9030 - val_loss: 0.8090 - val_acc: 0.7266 Test score: 0.809049695206 Test accuracy: 0.726640000038

GRU: 20s - loss: 0.4735 - acc: 0.7584 - val_loss: 0.3708 - val_acc: 0.8388 Epoch 2/5 12s - loss: 0.2609 - acc: 0.8932 - val_loss: 0.3774 - val_acc: 0.8362 Epoch 3/5 12s - loss: 0.1740 - acc: 0.9346 - val_loss: 0.4637 - val_acc: 0.8232 Epoch 4/5 12s - loss: 0.1132 - acc: 0.9593 - val_loss: 0.5032 - val_acc: 0.8160 Epoch 5/5 12s - loss: 0.0691 - acc: 0.9765 - val_loss: 0.7080 - val_acc: 0.8158 Test score: 0.708041801739 Test accuracy: 0.815840000038

LSTM: 26s - loss: 0.4353 - acc: 0.7924 - val_loss: 0.4062 - val_acc: 0.8214 Epoch 2/5 16s - loss: 0.2580 - acc: 0.8982 - val_loss: 0.3686 - val_acc: 0.8398 Epoch 3/5 16s - loss: 0.1756 - acc: 0.9352 - val_loss: 0.4138 - val_acc: 0.8276 Epoch 4/5 16s - loss: 0.1143 - acc: 0.9592 - val_loss: 0.5257 - val_acc: 0.8198 Epoch 5/5 16s - loss: 0.0783 - acc: 0.9717 - val_loss: 0.6960 - val_acc: 0.8167 Test score: 0.696038662281 Test accuracy: 0.816680000038 '''

Because I tested SRU in pytorch, SRU is not only faster than GRU ,but also get a better accuracy score than GRU. Hence, can you tell me how I can get a better accuray score using SRU than GRU?

titu1994 commented 6 years ago

I'm not quite sure as to why the performance is significantly lower. I think there is some error in how this one was implemented.

qinhui99 commented 6 years ago

I tested MinimalRNNCell in imdb dataset. It was faster than SRU and got a better accuracy score. : (

MinimalRNNCell 5s - loss: 0.6160 - acc: 0.7472 - val_loss: 0.5196 - val_acc: 0.7948 Epoch 2/5 3s - loss: 0.3548 - acc: 0.8660 - val_loss: 0.4381 - val_acc: 0.7985 Epoch 3/5 4s - loss: 0.2714 - acc: 0.9022 - val_loss: 0.5969 - val_acc: 0.8154 Epoch 4/5 3s - loss: 0.1974 - acc: 0.9277 - val_loss: 0.4511 - val_acc: 0.8057 Epoch 5/5 4s - loss: 0.1566 - acc: 0.9428 - val_loss: 0.4905 - val_acc: 0.8124 Test score: 0.490511267147 Test accuracy: 0.812359999981

Maybe there are some bugs in SRU. But I can't find them.

titu1994 commented 6 years ago

I have my finals now. I won't be able to look at them until the end of the month.

qinhui99 commented 6 years ago

That is ok. I will wait for you. Besides, I found a funny things. I used two selu activations in SRU. My codes likes these: rnn_layer1 = SRU(16,dropout=0.,recurrent_dropout=0.,activation='selu',implementation=0,\ unit_forget_bias=True, unroll=True,recurrent_activation='selu') (emb_item_desc)

It worked and got a better score than tanh or hard_sigmoid. Thank you for your hard work. :+1:

qinhui99 commented 6 years ago

When I changed the batch_size into 16, then SRU got 0.80 score. But the SRU program became very slow. My log is here:

Epoch 1/5 64s - loss: 0.5681 - acc: 0.6878 - val_loss: 0.4531 - val_acc: 0.7918 Epoch 2/5 57s - loss: 0.3319 - acc: 0.8578 - val_loss: 0.4234 - val_acc: 0.8054 Epoch 3/5 57s - loss: 0.2049 - acc: 0.9213 - val_loss: 0.4807 - val_acc: 0.8008 Epoch 4/5 58s - loss: 0.1230 - acc: 0.9554 - val_loss: 0.6687 - val_acc: 0.7855 Epoch 5/5 58s - loss: 0.0675 - acc: 0.9774 - val_loss: 0.8211 - val_acc: 0.7780

I don't know why small batch size can get higer score. Using such small batch size, the program will be very slow .

chengshaodi commented 2 years ago

@qinhui99 请问sru的用法和GRU的用法是一样的吗?

qinhui99 commented 2 years ago

抱歉回答晚了。sru和gru用法基本上相似。sru确实快,但是评价指标比gru差多了,分数掉了一截。如果你能忍受掉分,还是可以用用的。