Closed QiushiYang closed 3 years ago
Hyperparameter in pre-train and fine-tuning have different settings. Also, if you are running the scratch train, you need to decide on the hyperparameter that fits it. Hyperparameters for pre-train and fine-tuning can be found in the paper.
Thanks a lot for your reply. It was a coding bug and I have fixed the problems. Many thanks.
Thank you so much for sharing your codes. I try to employ Vit as the encoder and follow a common decoder to build a segmentation network. I train it from scratch but found the loss can't drop since the beginning of training, and the results keep near 0. Is there any trick for training Vit correctly? Is it very important to load the pre-train model to fine-tune? Here is my configuration:
patch_size=16 hidden_size=16*16*3 mlp_dim = 3072 dropout_rate = 0.1 num_heads = 12 num_layers = 12 lr=3e-4 opt=Adam weight_decay=0.0