Open Franklin-Yao opened 4 years ago
The training in the initial stage is not stable and may harm the model performance. We use the auxiliary training to solve the problem and decay the weight of the auxiliary training loss for later epochs.
Why does this auxiliary training help stabilize the training? The loss may let your model fit the val dataset and give worse acc for test dataset.
Hi,
I saw you updated your code and added auxiliary classifier. Why can it stabilize the training? Where did you get this idea?