Open kjappelbaum opened 2 years ago
-> learning curves are noisier for higher number of epochs (= more overfitting risk) -> however, best performance also for the higher number of epochs
I'm now gonna try deep ensembles. A quick search didn't show anyone doing this for fine tuning (?) which is surprising
i think this is the reason for the "peaks" we sometimes see in the learning curve. We seem to be able to shift them by tuning for how many epochs we fine-tune
there are different options: