microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.2k stars 2.55k forks source link

Hyperparameter about reproducing imagenet1k result on BEiT-large-512 #370

Closed winterfell2021 closed 3 years ago

winterfell2021 commented 3 years ago

Describe Model I am using BEIT, and I wonder how to reproduce acc@1 88.60 on the imagenet1k. I used hyperparameter in the get_started_for_image_classification.md

!OMP_NUM_THREADS=1 CUDA_HOME=/opt/cuda python -m torch.distributed.launch --nproc_per_node=2 run_class_finetuning.py \
    --model beit_large_patch16_512 --data_path /dataset/imagenet \
    --finetune https://unilm.blob.core.windows.net/beit/beit_large_patch16_224_pt22k_ft22k.pth \
    --output_dir ./result/beit_large_patch16_512 --batch_size 4 --lr 2e-5 --update_freq 64 \
    --warmup_epochs 5 --epochs 30 --layer_decay 0.9 --drop_path 0.4 \
    --weight_decay 1e-8 --enable_deepspeed --input_size 512

But I got a bad result trained on RTX3090x2, except i change the batch size from 32 to 4 and update_freq from 2 to 64 follow the Effective batch size. Is that the problem? Hope u can share the hyperparameter about the best evaluate result on BEiT-large-512. Thanks a lot.

donglixp commented 3 years ago

Hi @winterfell2021 ,

You could first rerun the evaluation step using our provided checkpoint as shown in https://github.com/microsoft/unilm/blob/master/beit/get_started_for_image_classification.md#evaluate-our-fine-tuned-checkpoints . If you could obtain expected results, then you can follow the fine-tuning instructions at https://github.com/microsoft/unilm/blob/master/beit/get_started_for_image_classification.md#fine-tuning .

donglixp commented 3 years ago

We used the same hyperparms as in the given instruction, without specific sweeping for 512x512. Could you also share the training logs and tensorboard so that we could figure out what the difference was.

winterfell2021 commented 3 years ago

@donglixp Thanks for your reply, I have tried rerun the evaluation step and got the best result. Here is my log:

{"train_lr": 7.142857142857128e-05, "train_min_lr": 5.37531041546429e-08, "train_loss": 1.383886338211596, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.3436247954001794, "test_acc1": 44.55958624330827, "test_acc5": 84.45595985373068, "epoch": 0, "n_parameters": 304653828}
{"train_lr": 0.00028571428571428454, "train_min_lr": 2.1501241661857156e-07, "train_loss": 1.339564881908397, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1861090302467345, "test_acc1": 61.39896523273053, "test_acc5": 85.49222918495613, "epoch": 1, "n_parameters": 304653828}
{"train_lr": 0.0004999999999999996, "train_min_lr": 3.7627172908250026e-07, "train_loss": 1.2338731759227812, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 0.9702010686580952, "test_acc1": 75.12953575410991, "test_acc5": 90.93264351484072, "epoch": 2, "n_parameters": 304653828}
{"train_lr": 0.0007142857142857145, "train_min_lr": 5.375310415464306e-07, "train_loss": 1.1430357340723276, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 0.7757947073532985, "test_acc1": 75.64767062725798, "test_acc5": 94.0414515678129, "epoch": 3, "n_parameters": 304653828}
{"train_lr": 0.0009285714285714295, "train_min_lr": 6.987903540103559e-07, "train_loss": 1.108779653441161, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 0.6567290494075189, "test_acc1": 81.86528720756887, "test_acc5": 96.89119218421106, "epoch": 4, "n_parameters": 304653828}
{"train_lr": 0.0009997746176942472, "train_min_lr": 7.523738481852174e-07, "train_loss": 1.069022581524526, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 0.7250815341105827, "test_acc1": 74.09326594851795, "test_acc5": 93.00518229588326, "epoch": 5, "n_parameters": 304653828}
{"train_lr": 0.0009977477875754866, "train_min_lr": 7.508485704385355e-07, "train_loss": 1.1153367947166164, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 0.8943284245637747, "test_acc1": 77.72020893393403, "test_acc5": 93.00518241447489, "epoch": 6, "n_parameters": 304653828}
{"train_lr": 0.000993298416218829, "train_min_lr": 7.475002251311409e-07, "train_loss": 1.1971247619949281, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.023184599326207, "test_acc1": 60.880830063103396, "test_acc5": 84.45595967584323, "epoch": 7, "n_parameters": 304653828}
{"train_lr": 0.0009864481805142873, "train_min_lr": 7.42345125064792e-07, "train_loss": 1.2298376884621878, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0862279717738812, "test_acc1": 65.2849758919039, "test_acc5": 83.41969004813872, "epoch": 8, "n_parameters": 304653828}
{"train_lr": 0.0009772304541216133, "train_min_lr": 7.354083853688332e-07, "train_loss": 1.2425422969584663, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0954815241006703, "test_acc1": 76.94300709857842, "test_acc5": 91.19171083282312, "epoch": 9, "n_parameters": 304653828}
{"train_lr": 0.0009656901448772364, "train_min_lr": 7.267238011417772e-07, "train_loss": 1.2622342561371624, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0846216440200807, "test_acc1": 65.80311023138965, "test_acc5": 84.71502711241727, "epoch": 10, "n_parameters": 304653828}
{"train_lr": 0.0009518834760077702, "train_min_lr": 7.163336828050098e-07, "train_loss": 1.2580135768900316, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0960755980931796, "test_acc1": 66.83938027416487, "test_acc5": 92.22798034193602, "epoch": 11, "n_parameters": 304653828}
{"train_lr": 0.0009358777122161002, "train_min_lr": 7.042886499706516e-07, "train_loss": 1.2496462639731665, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0865935536531302, "test_acc1": 73.056996142926, "test_acc5": 88.60103688215345, "epoch": 12, "n_parameters": 304653828}
{"train_lr": 0.0009177508319745303, "train_min_lr": 6.906473848279176e-07, "train_loss": 1.2451294435498614, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0978252713496868, "test_acc1": 73.31606458752884, "test_acc5": 91.45077815080553, "epoch": 13, "n_parameters": 304653828}
{"train_lr": 0.0008975911476214871, "train_min_lr": 6.754763462493629e-07, "train_loss": 1.2530802154603105, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1034993520149818, "test_acc1": 75.6476702121873, "test_acc5": 93.00518217729163, "epoch": 14, "n_parameters": 304653828}
{"train_lr": 0.0008754968751126867, "train_min_lr": 6.588494460099565e-07, "train_loss": 1.2559872255660594, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.107897635606619, "test_acc1": 72.79792906212683, "test_acc5": 89.11917175530152, "epoch": 15, "n_parameters": 304653828}
{"train_lr": 0.0008515756555229027, "train_min_lr": 6.408476886963341e-07, "train_loss": 1.251577908017983, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1148500103216905, "test_acc1": 79.53368093065647, "test_acc5": 95.595855594299, "epoch": 16, "n_parameters": 304653828}
{"train_lr": 0.0008259440306294099, "train_min_lr": 6.215587770605941e-07, "train_loss": 1.2606690825584035, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.126454827418694, "test_acc1": 72.53886186273604, "test_acc5": 95.07772083974255, "epoch": 17, "n_parameters": 304653828}
{"train_lr": 0.0007987268751322056, "train_min_lr": 6.010766847413161e-07, "train_loss": 1.261464463857313, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1174103608498207, "test_acc1": 69.94818820854543, "test_acc5": 91.4507779729181, "epoch": 18, "n_parameters": 304653828}
{"train_lr": 0.0007700567882770182, "train_min_lr": 5.795011984334233e-07, "train_loss": 1.2531929233421881, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.120338787482335, "test_acc1": 78.75647838375112, "test_acc5": 93.00518229588326, "epoch": 19, "n_parameters": 304653828}
{"train_lr": 0.0007400734478451037, "train_min_lr": 5.569374317374512e-07, "train_loss": 1.2519506031336884, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1262933208392216, "test_acc1": 76.68393954341276, "test_acc5": 93.26424949527404, "epoch": 20, "n_parameters": 304653828}
{"train_lr": 0.0007089229296571262, "train_min_lr": 5.334953130566369e-07, "train_loss": 1.258887875204285, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1174328895715566, "test_acc1": 73.57513178691963, "test_acc5": 92.48704765991843, "epoch": 21, "n_parameters": 304653828}
addf400 commented 3 years ago

Hi @winterfell2021 , there seems to be nothing wrong with your running script but the fine-tuning log seems to use a different hyperparameter. For example, the learning rate in your log is 1e-3 and the weight decay is 0.05, which does not match your fine-tuning script (lr=2e-5, weight_decay=1e-8). You can check the tuning script and ensure that the recommended hyper parameters are applied.

donglixp commented 3 years ago

@winterfell2021 Could you double confirm whether you used the corrected hyperparamters for the above log?

donglixp commented 1 year ago

The code and pre-trained models of BEiT-3 can be found at aka.ms/beit3.