training for classification does not converge

zwangab91 commented 6 years ago

I tried to train the classification models for alexnet and inception, with the hyperparameters in train.py ( 'learning_rate_decay_type': 'exponential', 'learning_rate': '0.01', 'learning_rate_decay_factor': '0.1'), but the loss fluctuates around 6 and 11 respectively for the two models. I tried to tune the learning rate in the range from 1e-5 to 0.1, but the training still shows no sign of convergence (even after 10,000 steps). Could you inform me of the hyperparameters chosen for the training of the classification models in order to reproduce the results, and the final values of the cross-entropy loss?

yuantailing commented 6 years ago

1) We didn't tune hyper-parameters. The hyper-parameters we used is what you find in git. 2) I forget the cross-entropy loss. But the loss is only cross-entropy loss as I know. 3) 10,000 steps is far from convergence, we trained 100,000 steps. 1 epoch is 800,000 / 64 = 12,500. Don't pray net learning well before 1 epoch.

Please be patient, I believe you can reproduce the exact result (the only problem is random seed) without any modification.

zwangab91 commented 6 years ago

Thanks! The loss did drop down to around 2 after 5 epochs of training.

yuantailing / ctw-baseline

training for classification does not converge #19