Open tanzir5 opened 5 months ago
Multiple gpus are almost certainly needed for this. For now, we can have an estimate tomorrow (Apr 17) using one gpu.
We did an initial run for 30 epochs on 5% of the population. Here are the results:
Validation losses for the LLM: 0: 3311, 5: 2571 10: 2480 15: 2397 20: 4909 21: 5200 23: 5100 25: 6100 30: 4943 31: 4900
Training losses 0: 14445, 5: 11000 10: 10200 14: 10000 15: 9978 16: 14760 17: 22300 18: 25000 19: 26000 20: 22000
MLM loss, CLS loss 0: 16600, 5700 5: 12400, 5360 10: 11400, 5200 15: 11200, 5200 16: 17000, 5400 17: 26000, 5700
Next, we need to train using the entire population and get this estimate.
We need to estimate when the model starts overfitting during pretraining. We should have a table like this: