Closed lan-lw closed 4 months ago
Try reducing the validation steps to 50 or 100 and retrain the model.
Based on experience, in the case of Giga, we need to consider various methods to avoid overfitting.
However, proposing such methods is beyond the scope of this study, so we used the same training parameters for all training.
Using Giga as the backbone and training a phi model with LinCIR can lead to very quick convergence.
Try reducing the validation steps to 50 or 100 and retrain the model.
Based on experience, in the case of Giga, we need to consider various methods to avoid overfitting.
However, proposing such methods is beyond the scope of this study, so we used the same training parameters for all training.
Using Giga as the backbone and training a phi model with LinCIR can lead to very quick convergence.
Got it, thanks!
Hi, I tried to pretrained a phi model with ViT-G backbone, but the results are not as good. Can you provide a pretrained model with ViT-G as backbone?