Open TylerGubala opened 5 years ago
https://machinelearningmastery.com/exploding-gradients-in-neural-networks
My first thought was this. I'm not entirely sure tho. In my experience training for too long would lead to gibberish or quoting dataset word by word. (Exploding gradients or overfitting.)
@doneforaiur Thanks for the link! I am relatively new to the deep learning field, though I toy around with projects like this from time to time.
Would you mind correcting your hyperlink? The text is fine, but the link itself references back to the "issues" page with the tag "url".
Do you think that this is potentially an issue where training for too long should simply be avoided, or is there some instability of the model itself?
As far as I know, LSTM helps with exploding gradients problem. Sadly, I'm not sure. Did you tinker with "keep probabilities"? Maybe the model's neurons count dips too low? :(
@doneforaiur Sorry, I'm not sure what that means. Is that a parameter that I can feed into the train_from_file
function?
https://github.com/karpathy/char-rnn#best-models-strategy
I meant "dropout". I'm sure it's implemented in textgenrnn too.
Ahhhh, I think I got it. The learning rate drops too low when the current epoch number is too high. I think it might be something to do with that.
textgenrnn.py -> 214th line
# scheduler function must be defined inline.
def lr_linear_decay(epoch):
return (base_lr * (1 - (epoch / num_epochs)))
That is quite strange, I guess I'll try looking around and seeing how others do learning rates.
I have the same issue, how do we set the learning rate to be static?
Or even change the base learning rate?
Or even change the base learning rate?
I guess you can just alter the function. Alas, as Max himself has said, tinkering with the base lr doesn't necessarily bring about any improvement :/.
more text then needed more loss
more training then needed more loss
I've been loving this utility and have been amused with the results so far.
Something strange that I've been noticing though; I've given it various texts and it seems like it's been choking after several training epochs.
Example code:
Output:
Seems like it goes a bit off the rails. Interestingly, I was watching it for a bit while it trained, seemed like it was increasing loss as it evaluated the text, which was odd to me.
The training document that I used is attached, not sure if it needs to follow some rule? It's around 16MB, so I figured large enough?
Thanks in advance!
quoteraw.txt