Open Maelstrom2014 opened 1 month ago
I think what you are describing is how the visualization works normally without the normalize_y -option, but I suppose it should probably be called something different.
I have not heard of lowering the actual LR, I'd imagine that's handled by the optimizer settings like cosine? Or decay? I don't really have enough experience with all this to say though.
Thanx a lot for great trainer! For the first time I managed to train on 16 Vram!!! on 768 pix bucket.
Thinks its not working as should.
Should be something like this: plt.ylim(min([loss_values), max(loss_values)])
Is it any reason to lower learning rate after 1000 steps? And how to make it with nodes?