odissei-lifecourse / life-sequencing-dutch

MIT License
0 stars 0 forks source link

Training Epoch estimation for overfitting #15

Open tanzir5 opened 5 months ago

tanzir5 commented 5 months ago

We need to estimate when the model starts overfitting during pretraining. We should have a table like this:

Epoch Training Total Validation Total Training MLM Validation MLM Training CLS Validation CLS
0            
1            
2            
3            
4            
.            
.          
tanzir5 commented 5 months ago

Multiple gpus are almost certainly needed for this. For now, we can have an estimate tomorrow (Apr 17) using one gpu.

tanzir5 commented 4 months ago

We did an initial run for 30 epochs on 5% of the population. Here are the results:

Validation losses for the LLM: 0: 3311, 5: 2571 10: 2480 15: 2397 20: 4909 21: 5200 23: 5100 25: 6100 30: 4943 31: 4900

Training losses 0: 14445, 5: 11000 10: 10200 14: 10000 15: 9978 16: 14760 17: 22300 18: 25000 19: 26000 20: 22000

MLM loss, CLS loss 0: 16600, 5700 5: 12400, 5360 10: 11400, 5200 15: 11200, 5200 16: 17000, 5400 17: 26000, 5700

tanzir5 commented 4 months ago

Next, we need to train using the entire population and get this estimate.