Reproduce results using training

khawar-islam commented 3 years ago

Thank you @zhongyy for your pre-trained models. I verified all ViTP12S8 results in the paper and all results are the same as the paper. Thank you @zhongyy. After seeing your model name, I think you stop your model in Epoch_2_Batch_12000. Am i right?

I would like to ask that if I will not use pre-trained models and try to reproduce results. Is that possible?

I am running CUDA_VISIBLE_DEVICES='0,1,2,3' python3 -u train.py -b 480 -w 0,1,2,3 -d retina -n VITs -head CosFace --outdir ./results/ViT-P12S8_ms1m_cosface_s1 --warmup-epochs 1 --lr 3e-4

Still, the ACC is 50%

highest_acc: [0.5375, 0.538, 0.5048333333333334, 0.517, 0.5651428571428572, 0.5091666666666667]
Epoch 1 Batch 15110 Speed: 6.00 samples/s   Training Loss 33.7626 (33.8095) Training Prec@1 0.000 (0.000)
Epoch 1 Batch 15120 Speed: 208.26 samples/s Training Loss 33.7692 (33.7805) Training Prec@1 0.000 (0.000)
Learning rate 0.000001
Perform Evaluation on ['lfw', 'talfw', 'calfw', 'cplfw', 'cfp_fp', 'agedb_30'] , and Save Checkpoints...
(12000, 512)

zhongyy commented 3 years ago

Hi, @khawar512. I think you can train the model and reproduce results by yourself. The "Epoch_2_Batch_12000" is not the model saved at lr=3e=4, but lr= 1e-4. You can first train the model using lr=3e-4 for 20 epochs, and then finetune from the saved model using lr=1e-4 about 5 epochs. Maybe it will take 5-7 days with 4 Tesla V100. However, I find your model is still in the first epoch (warmup). :) Maybe you can train for a longer time and check the performance again.

khawar-islam commented 3 years ago

Thank you @zhongyy Actually, I am using two GPUs that way it takes a lot of time.

XinWangg commented 3 years ago

@khawar512 Have you reproduce the result?

khawar-islam commented 3 years ago

@XinWangg. The author shared a trained model not i have matched all results and all are correct. He also shared all steps but literally training on 1 million images takes a lot of time and GPUs as well. I am trying to figuring out to train on a small dataset.

zhongyy / Face-Transformer

Reproduce results using training #3