Open shivammehta25 opened 1 year ago
"All models in the ablation study were trained up to 300k steps" From paper.
So the available checkpoint in the GitHub repository is also trained up to 300k steps?
So the available checkpoint in the GitHub repository is also trained up to 300k steps?
Must be so. Train dataset is 12.5k, so it's ~1500 iterations over the whole dataset (it may be more correct to reach this value instead of batch steps, which rely on your trainset size and batch size).
Hello,
Thank you for the fantastic work :D On loading the checkpoint I see that value for the iteration key is set to zero. How long and with what batch size was the provided pretrained model trained for?