titu1994 / keras-SRU

Implementation of Simple Recurrent Unit in Keras
89 stars 32 forks source link
keras recurrent-neural-networks sru

Keras Simple Recurrent Unit (SRU)

Implementation of Simple Recurrent Unit in Keras. Paper - Training RNNs as Fast as CNNs

This is a naive implementation with some speed gains over the generic LSTM cells, however its speed is not yet 10x that of cuDNN LSTMs

Issues

No longer a problem to have different input dimension than output dimension.

Performance degrades substantially with larger batch sizes (about 6-7% on average over 5 runs) compared to 1 layer LSTM with batch size of 128. However, a multi layer SRU (I've tried with 3 layers), while a bit slower than a 1 layer LSTM, gets around the same score on batch size of 32 or 128.

Seems the solution to this is to stack several SRUs together. The authors recommend stacks of 4 SRU layers.

However, once batch size is increased to 128, SRU takes just 7 seconds per epoch compared to LSTM 22 seconds. For comparison, CNNs take 3-4 seconds per epoch.