Open gaow0007 opened 5 years ago
We achieved better results when running for more epochs (for both manifold mixup and our baselines), but we definitely saw an improvement with manifold mixup over input mixup for 600 epochs.
I think it helps with an even smaller number of epochs (<600) but I don't recall if I've actually run that experiment.
Another way to think about this is that Manifold Mixup is a stronger regularizer and hence you need more training epochs to train with it.
I have a question about the result of epoch. Why do you use 600-2000 epoch to validate the superiority of your method? I think that the epoch number is too large and sometimes I only use 200 epoch to train these tiny dataset. Any reasons about settings of epoch?
Best