Closed winterfell2021 closed 3 years ago
Hi @winterfell2021 ,
You could first rerun the evaluation step using our provided checkpoint as shown in https://github.com/microsoft/unilm/blob/master/beit/get_started_for_image_classification.md#evaluate-our-fine-tuned-checkpoints . If you could obtain expected results, then you can follow the fine-tuning instructions at https://github.com/microsoft/unilm/blob/master/beit/get_started_for_image_classification.md#fine-tuning .
We used the same hyperparms as in the given instruction, without specific sweeping for 512x512. Could you also share the training logs and tensorboard so that we could figure out what the difference was.
@donglixp Thanks for your reply, I have tried rerun the evaluation step and got the best result. Here is my log:
{"train_lr": 7.142857142857128e-05, "train_min_lr": 5.37531041546429e-08, "train_loss": 1.383886338211596, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.3436247954001794, "test_acc1": 44.55958624330827, "test_acc5": 84.45595985373068, "epoch": 0, "n_parameters": 304653828}
{"train_lr": 0.00028571428571428454, "train_min_lr": 2.1501241661857156e-07, "train_loss": 1.339564881908397, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1861090302467345, "test_acc1": 61.39896523273053, "test_acc5": 85.49222918495613, "epoch": 1, "n_parameters": 304653828}
{"train_lr": 0.0004999999999999996, "train_min_lr": 3.7627172908250026e-07, "train_loss": 1.2338731759227812, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 0.9702010686580952, "test_acc1": 75.12953575410991, "test_acc5": 90.93264351484072, "epoch": 2, "n_parameters": 304653828}
{"train_lr": 0.0007142857142857145, "train_min_lr": 5.375310415464306e-07, "train_loss": 1.1430357340723276, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 0.7757947073532985, "test_acc1": 75.64767062725798, "test_acc5": 94.0414515678129, "epoch": 3, "n_parameters": 304653828}
{"train_lr": 0.0009285714285714295, "train_min_lr": 6.987903540103559e-07, "train_loss": 1.108779653441161, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 0.6567290494075189, "test_acc1": 81.86528720756887, "test_acc5": 96.89119218421106, "epoch": 4, "n_parameters": 304653828}
{"train_lr": 0.0009997746176942472, "train_min_lr": 7.523738481852174e-07, "train_loss": 1.069022581524526, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 0.7250815341105827, "test_acc1": 74.09326594851795, "test_acc5": 93.00518229588326, "epoch": 5, "n_parameters": 304653828}
{"train_lr": 0.0009977477875754866, "train_min_lr": 7.508485704385355e-07, "train_loss": 1.1153367947166164, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 0.8943284245637747, "test_acc1": 77.72020893393403, "test_acc5": 93.00518241447489, "epoch": 6, "n_parameters": 304653828}
{"train_lr": 0.000993298416218829, "train_min_lr": 7.475002251311409e-07, "train_loss": 1.1971247619949281, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.023184599326207, "test_acc1": 60.880830063103396, "test_acc5": 84.45595967584323, "epoch": 7, "n_parameters": 304653828}
{"train_lr": 0.0009864481805142873, "train_min_lr": 7.42345125064792e-07, "train_loss": 1.2298376884621878, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0862279717738812, "test_acc1": 65.2849758919039, "test_acc5": 83.41969004813872, "epoch": 8, "n_parameters": 304653828}
{"train_lr": 0.0009772304541216133, "train_min_lr": 7.354083853688332e-07, "train_loss": 1.2425422969584663, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0954815241006703, "test_acc1": 76.94300709857842, "test_acc5": 91.19171083282312, "epoch": 9, "n_parameters": 304653828}
{"train_lr": 0.0009656901448772364, "train_min_lr": 7.267238011417772e-07, "train_loss": 1.2622342561371624, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0846216440200807, "test_acc1": 65.80311023138965, "test_acc5": 84.71502711241727, "epoch": 10, "n_parameters": 304653828}
{"train_lr": 0.0009518834760077702, "train_min_lr": 7.163336828050098e-07, "train_loss": 1.2580135768900316, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0960755980931796, "test_acc1": 66.83938027416487, "test_acc5": 92.22798034193602, "epoch": 11, "n_parameters": 304653828}
{"train_lr": 0.0009358777122161002, "train_min_lr": 7.042886499706516e-07, "train_loss": 1.2496462639731665, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0865935536531302, "test_acc1": 73.056996142926, "test_acc5": 88.60103688215345, "epoch": 12, "n_parameters": 304653828}
{"train_lr": 0.0009177508319745303, "train_min_lr": 6.906473848279176e-07, "train_loss": 1.2451294435498614, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.0978252713496868, "test_acc1": 73.31606458752884, "test_acc5": 91.45077815080553, "epoch": 13, "n_parameters": 304653828}
{"train_lr": 0.0008975911476214871, "train_min_lr": 6.754763462493629e-07, "train_loss": 1.2530802154603105, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1034993520149818, "test_acc1": 75.6476702121873, "test_acc5": 93.00518217729163, "epoch": 14, "n_parameters": 304653828}
{"train_lr": 0.0008754968751126867, "train_min_lr": 6.588494460099565e-07, "train_loss": 1.2559872255660594, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.107897635606619, "test_acc1": 72.79792906212683, "test_acc5": 89.11917175530152, "epoch": 15, "n_parameters": 304653828}
{"train_lr": 0.0008515756555229027, "train_min_lr": 6.408476886963341e-07, "train_loss": 1.251577908017983, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1148500103216905, "test_acc1": 79.53368093065647, "test_acc5": 95.595855594299, "epoch": 16, "n_parameters": 304653828}
{"train_lr": 0.0008259440306294099, "train_min_lr": 6.215587770605941e-07, "train_loss": 1.2606690825584035, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.126454827418694, "test_acc1": 72.53886186273604, "test_acc5": 95.07772083974255, "epoch": 17, "n_parameters": 304653828}
{"train_lr": 0.0007987268751322056, "train_min_lr": 6.010766847413161e-07, "train_loss": 1.261464463857313, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1174103608498207, "test_acc1": 69.94818820854543, "test_acc5": 91.4507779729181, "epoch": 18, "n_parameters": 304653828}
{"train_lr": 0.0007700567882770182, "train_min_lr": 5.795011984334233e-07, "train_loss": 1.2531929233421881, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.120338787482335, "test_acc1": 78.75647838375112, "test_acc5": 93.00518229588326, "epoch": 19, "n_parameters": 304653828}
{"train_lr": 0.0007400734478451037, "train_min_lr": 5.569374317374512e-07, "train_loss": 1.2519506031336884, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1262933208392216, "test_acc1": 76.68393954341276, "test_acc5": 93.26424949527404, "epoch": 20, "n_parameters": 304653828}
{"train_lr": 0.0007089229296571262, "train_min_lr": 5.334953130566369e-07, "train_loss": 1.258887875204285, "train_loss_scale": 128.0, "train_weight_decay": 0.05000000000000001, "test_loss": 1.1174328895715566, "test_acc1": 73.57513178691963, "test_acc5": 92.48704765991843, "epoch": 21, "n_parameters": 304653828}
Hi @winterfell2021 , there seems to be nothing wrong with your running script but the fine-tuning log seems to use a different hyperparameter. For example, the learning rate in your log is 1e-3 and the weight decay is 0.05, which does not match your fine-tuning script (lr=2e-5, weight_decay=1e-8). You can check the tuning script and ensure that the recommended hyper parameters are applied.
@winterfell2021 Could you double confirm whether you used the corrected hyperparamters for the above log?
The code and pre-trained models of BEiT-3 can be found at aka.ms/beit3.
Describe Model I am using BEIT, and I wonder how to reproduce acc@1 88.60 on the imagenet1k. I used hyperparameter in the get_started_for_image_classification.md
But I got a bad result trained on RTX3090x2, except i change the batch size from 32 to 4 and update_freq from 2 to 64 follow the Effective batch size. Is that the problem? Hope u can share the hyperparameter about the best evaluate result on BEiT-large-512. Thanks a lot.