The same problem of #18 happened to me when train the model and my problem was even more severe that my train accuracy decrease from 90% to 50%. Though it would increase to a new optimal accuracy after hundreds iters, I still confused with this phenomenon.
Perhaps the split of validation set may be the reason to it., but why does L-softmax suffer such a great decrease when testing the net. You know such a decrease is rare with a normal softmax.
Hi there!
The same problem of #18 happened to me when train the model and my problem was even more severe that my train accuracy decrease from 90% to 50%. Though it would increase to a new optimal accuracy after hundreds iters, I still confused with this phenomenon.
Thank you for your help!