make number of epochs adaptive on training set size?

kjappelbaum commented 2 years ago

i think this is the reason for the "peaks" we sometimes see in the learning curve. We seem to be able to shift them by tuning for how many epochs we fine-tune

there are different options:

use a validation set to tune this for every training set size (expensive as we'll need to do this multiple times due to the variance)
come up with some heuristic (?). However, if we inform this based on the experiments we ran this is kind of data leakage.
use ensemble of models tuned for 2, 4, 6, whatever epochs
use deep ensembles (at fixed number of epochs). However, since there is less randomness (always the same initialization) we probably need to (sub)sample

kjappelbaum commented 2 years ago

-> learning curves are noisier for higher number of epochs (= more overfitting risk) -> however, best performance also for the higher number of epochs

kjappelbaum commented 2 years ago

photoswitch_smiles_lc.pdf

kjappelbaum commented 2 years ago

I'm now gonna try deep ensembles. A quick search didn't show anyone doing this for fine tuning (?) which is surprising

kjappelbaum / gpt3forchem

make number of epochs adaptive on training set size? #10