Open XingxingZhang opened 8 years ago
Update:
The results of unidirectional LSTM is deterministic! It is likely that in bi-directional LSTM, the forward and the backward LSTM are computed concurrently and the dropout masks were applied randomly!
Thanks for reporting the issue! We have discovered that there are problems with dropout application in cudnn (non-determinism that you've discovered, and issues in the weight update), and are looking it. As a workaround, you can create your network layer-by-layer without dropout, and apply nn.Dropout between the layers.
@ngimel thanks for the feedback and the suggestion.
I have a follow-up question: How is dropout for LSTM (designed) implemented in cudnn v5? Are your guys using the strategies in https://arxiv.org/abs/1409.2329 (applying dropout to input of each layer)?
Yes, we are applying dropout to input of each layer.
sorry for pushing this again, has the non-deterministic behaviour been addressed in newer pytorch / cudnn versions?
Any progress on this issue ?
Hi there,
I am very exciting to try out the new LSTM (or bidirectional LSTM) models in torch.cudnn and they are faster than my own implementation.
However, when I tried to set the dropout rate of BLSTM to 0.2 (rnn_dropout = 0.2), BLSTM produces different results from time to time even if I used fixed random seeds. When I disabled dropout in BLSTM (rnn_dropout = 0), it produces deterministic results. My code is attached.
I assume there maybe two reasons for this. 1) the random seed I passed to BLSTM is not working and cudnn is using its default random generation mechanism. 2) dropout in cudnn is not deterministic. But I didn't find any discussions about the deterministic of dropout in CuDNN User Guide.