Closed vineetjohn closed 6 years ago
Dropout embedded encoder and decoder inputs
Add beam search decoder for inference
Throttle learning rate EDIT: Done
Pass encoder cell states to the decoder initial state This might bypass the disentanglement EDIT: Done
Use temperature for variability Not applicable right now
Switch to SGD from Adam Adam converges quicker but is prone to over-fitting EDIT: Done
Add cost for repeated tokens? Not applicable right now
Append hidden state to every time step Not applicable right now
Increase RNN neurons EDIT: Done
Reduce batch size to avoid over-generalization EDIT: Done
Fixed. Solution was to reduce batch size.
Clip gradients