Closed allanj closed 4 years ago
It seems you build an LSTM from the scratch
As described in the paper:
we drop nodes in the LSTM layers (input and recurrent connections), applying the same dropout mask at every recurrent timestep (cf. the Bayesian dropout of Gal & Ghahramani (2015));
It seems you build an LSTM from the scratch