Open iridiumblue opened 5 years ago
Aside from activation regularization, I think this repo implements all the main ideas from AWD-LSTM that pertain to the LSTM
Side note - as I have seen continued success with this version, does it seem to you that "AWD-LSTM" really warrants a distinct name? None of the features seem, to me, to be a distinct idea (like, say, 'Attention' is), rather the whole thing strikes me as a set of rather natural refinements.
What say you? (I haven't dug too deeply into the details as my current focus is elsewhere, and your implementation here Just Works well enough that I haven't needed to think about it much ...)
Thank you!
I am putting this to use in my work which takes a novel-ish approach to text classification based on LSTM. I'm seeing a pronounced improvement as it appears to 'tame' the model, making it less prone to overfitting and less sensitive to hyperparameters like batch size. A sure and steady march upward.
Quick question; how close does better_lstm get us to AWD-LSTM ? It appears at first blush to cover a good deal of the same ground ...