Closed river-afk closed 5 years ago
It is a good question. It is really hard to decide which model can be used for next iteration. But I don't validate all models and pick the best one. I train the model with 120000 iterations and only use the last model.
Thanks for the quick reply. Very interesting.
It means that the last model always performs better than the initial one from the previous iteration. Does it hold true in all of your experiments? Do you always observe stable results using the last models, i.e. train the framework multiple times and achieve consistent improvement after each SSL iteration?
Actually, the last model (120000 iterations for SSL) is not always the best but almost the best one. But I think the result is stable when I choose 120000 iterations for SSL. I do try to do the same experiments for several times and achieve a similar performance. The results shown in the paper is not the best one but the "average" one I pick up from one experiment. However, if we train the model without SSL, overfitting problem is much more severe. Choosing 80000 iterations is very important.
Hello @liyunsheng13 , thank you very much for the code. I have questions on Algorithm 1 in your paper:
How do you select which M^k_i model (trained with Eqn 3) that will be used for the next iteration. I imagine that you validate all snapshots of M^k_i (saved during training) and pick the best? Similar question for F^k (trained with Eqn 2) and M^k_0 (trained with Eqn 1)
Thanks