keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.42k stars 82 forks source link

The finetuning hyperparameters of resnet50 #27

Open Vickeyhw opened 1 year ago

Vickeyhw commented 1 year ago

The hyperparameter settings (batch size and learning rate) in the paper seem inconsistent with the code. Which could reproduce the performance (80.6 acc) reported in the paper?

keyu-tian commented 1 year ago

You could see https://github.com/keyu-tian/SparK/blob/main/downstream_imagenet/arg.py#L20 and this would result in 80.6 acc.

Training losses of each epoch:

[0.1574, 0.0063, 0.0056, 0.0054, 0.0053, 0.0053, 0.0052, 0.0052, 0.0051, 0.0051, 0.0050, 0.0050, 0.0050, 0.0050, 0.0050, 0.0049, 0.0049, 0.0049, 0.0049, 0.0050, 0.0048, 0.0048, 0.0049, 0.0048, 0.0048, 0.0048, 0.0048, 0.0048, 0.0048, 0.0048, 0.0048, 0.0047, 0.0048, 0.0048, 0.0047, 0.0047, 0.0048, 0.0048, 0.0047, 0.0048, 0.0047, 0.0047, 0.0047, 0.0047, 0.0047, 0.0048, 0.0047, 0.0048, 0.0047, 0.0047, 0.0047, 0.0046, 0.0046, 0.0046, 0.0047, 0.0046, 0.0046, 0.0045, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0045, 0.0045, 0.0045, 0.0045, 0.0045, 0.0046, 0.0045, 0.0044, 0.0045, 0.0045, 0.0045, 0.0044, 0.0045, 0.0045, 0.0045, 0.0044, 0.0045, 0.0045, 0.0045, 0.0045, 0.0044, 0.0045, 0.0044, 0.0044, 0.0044, 0.0044, 0.0044, 0.0044, 0.0045, 0.0044, 0.0044, 0.0044, 0.0044, 0.0043, 0.0044, 0.0043, 0.0044, 0.0044, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0042, 0.0042, 0.0042, 0.0042, 0.0042, 0.0042, 0.0042, 0.0042, 0.0041, 0.0042, 0.0042, 0.0042, 0.0041, 0.0041, 0.0041, 0.0041, 0.0041, 0.0041, 0.0042, 0.0041, 0.0041, 0.0041, 0.0041, 0.0040, 0.0041, 0.0040, 0.0040, 0.0041, 0.0041, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0039, 0.0040, 0.0040, 0.0039, 0.0039, 0.0040, 0.0040, 0.0040, 0.0039, 0.0040, 0.0039, 0.0039, 0.0040, 0.0039, 0.0038, 0.0039, 0.0039, 0.0039, 0.0039, 0.0038, 0.0039, 0.0038, 0.0038, 0.0038, 0.0039, 0.0038, 0.0038, 0.0038, 0.0038, 0.0038, 0.0038, 0.0037, 0.0038, 0.0038, 0.0038, 0.0037, 0.0037, 0.0038, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0036, 0.0036, 0.0036, 0.0037, 0.0037, 0.0037, 0.0036, 0.0036, 0.0036, 0.0036, 0.0036, 0.0036, 0.0036, 0.0035, 0.0035, 0.0036, 0.0035, 0.0035, 0.0035, 0.0035, 0.0035, 0.0036, 0.0036, 0.0034, 0.0035, 0.0035, 0.0035, 0.0034, 0.0035, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0035, 0.0034, 0.0034, 0.0033, 0.0034, 0.0034, 0.0034, 0.0034, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0032, 0.0032, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0032, 0.0033, 0.0033, 0.0032, 0.0033, 0.0033, 0.0033, 0.0032, 0.0032, 0.0033, 0.0033]

Best validation accs (EMA model) of each epoch:

[0.12, 0.72, 22.60, 45.40, 52.51, 55.70, 58.43, 60.69, 62.26, 63.42, 64.43, 65.04, 65.57, 66.13, 66.57, 66.89, 67.34, 67.53, 67.94, 68.10, 68.11, 68.37, 68.62, 68.76, 68.87, 68.95, 69.18, 69.34, 69.35, 69.50, 69.67, 69.67, 69.90, 69.98, 69.98, 70.06, 70.10, 70.24, 70.28, 70.35, 70.35, 70.35, 70.44, 70.60, 70.64, 70.79, 71.00, 71.00, 71.00, 71.01, 71.04, 71.20, 71.20, 71.23, 71.23, 71.31, 71.31, 71.43, 71.46, 71.50, 71.60, 71.71, 71.71, 71.73, 71.87, 71.89, 72.13, 72.13, 72.13, 72.13, 72.19, 72.19, 72.25, 72.28, 72.43, 72.50, 72.57, 72.60, 72.69, 72.69, 72.69, 72.69, 72.77, 72.79, 72.93, 72.98, 72.98, 73.05, 73.21, 73.30, 73.30, 73.30, 73.38, 73.51, 73.53, 73.60, 73.61, 73.67, 73.67, 73.67, 73.67, 73.70, 73.79, 73.82, 74.02, 74.02, 74.09, 74.09, 74.09, 74.23, 74.27, 74.31, 74.42, 74.43, 74.50, 74.53, 74.58, 74.64, 74.72, 74.93, 74.93, 74.93, 74.93, 75.04, 75.04, 75.07, 75.14, 75.21, 75.27, 75.33, 75.36, 75.36, 75.45, 75.49, 75.57, 75.62, 75.71, 75.83, 75.83, 75.85, 75.96, 76.01, 76.06, 76.14, 76.16, 76.17, 76.29, 76.29, 76.29, 76.38, 76.45, 76.56, 76.60, 76.64, 76.71, 76.76, 76.80, 76.95, 77.03, 77.10, 77.10, 77.16, 77.18, 77.28, 77.28, 77.37, 77.38, 77.53, 77.59, 77.61, 77.61, 77.74, 77.75, 77.88, 77.96, 77.99, 78.01, 78.09, 78.15, 78.18, 78.24, 78.25, 78.26, 78.37, 78.44, 78.48, 78.63, 78.63, 78.69, 78.69, 78.69, 78.72, 78.73, 78.76, 78.80, 78.88, 78.91, 79.03, 79.07, 79.07, 79.07, 79.09, 79.25, 79.25, 79.25, 79.25, 79.30, 79.30, 79.33, 79.39, 79.44, 79.54, 79.54, 79.59, 79.61, 79.66, 79.66, 79.72, 79.77, 79.82, 79.83, 79.83, 79.89, 79.89, 79.96, 79.97, 80.04, 80.04, 80.06, 80.11, 80.18, 80.18, 80.19, 80.19, 80.19, 80.19, 80.19, 80.22, 80.25, 80.25, 80.25, 80.25, 80.28, 80.30, 80.30, 80.30, 80.30, 80.30, 80.37, 80.39, 80.39, 80.44, 80.44, 80.44, 80.44, 80.44, 80.44, 80.44, 80.44, 80.48, 80.49, 80.49, 80.51, 80.52, 80.53, 80.53, 80.53, 80.53, 80.53, 80.53, 80.53, 80.55, 80.56, 80.58, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59]
Vickeyhw commented 1 year ago

@keyu-tian I have tried this setting except that batch size is 2048, for the reason that 8 gpus cannot accommodate 4096 images. Unfortunately, my training losses became NaN. Does the batch size affect so much?

keyu-tian commented 1 year ago

@Vickeyhw can u provide the command and logs?

Vickeyhw commented 1 year ago

@keyu-tian The args are:

(nstream_imagenet/main.py, line  29)=> initial args:
{'base_lr': 0.002,
 'batch_size_per_gpu': 256,
 'best_val_acc': 0.0,
 'bs': 2048,
 'clip': -1,
 'cmd': '--local_rank=0 --exp_name res50 --exp_dir '
        './output//Spark_res50_official_finetune_default_lr2e-3_ld0_7_300e '
        '--data_path=imagenet2012/ImageNet_ILSVRC2012 '
        '--model=resnet50 --ep=300 --bs=2048 '
        '--resume_from=./resnet50_1kpretrained_timm_style.pth',

 'cur_ep': '',
 'data_path': '/home/bingxing2/public/imagenet2012/ImageNet_ILSVRC2012',
 'dataloader_workers': 8,
 'device': device(type='cuda', index=0),
 'dist_on_itp': False,
 'dist_url': 'env://',
 'drop_path': 0.05,
 'ema': 0.9999,
 'ep': 300,
 'eval_data_path': '',
 'exp_dir': 'output/Spark_res50_official_finetune_default_lr2e-3_ld0_7_300e',
 'exp_name': 'res50',
 'finish_time': '',
 'first_logging': True,
 'glb_batch_size': 2048,
 'global_rank': 0,
 'img_size': 224,
 'is_local_master': True,
 'is_master': True,
 'local_rank': 0,
 'log_epoch': <bound method FineTuneArgs.log_epoch of FineTuneArgs(prog='main.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'log_txt_name': 'output/Spark_res50_official_finetune_default_lr2e-3_ld0_7_300e/finetune_log.txt',
 'lr': 0.016,
 'lr_scale': 0.7,
 'mixup': 0.1,
 'model': 'resnet50',
 'opt': 'lamb',
 'remain_time': '',
 'rep_aug': 0,
 'resume_from': './resnet50_1kpretrained_timm_style.pth',
 'sbn': False,
 'tb_lg_dir': '/home/bingxing2/home/scx6008/hw/SparK/output/Spark_res50_official_finetune_default_lr2e-3_ld0_7_300e/tensorboard_log',
 'train_acc': 0.0,
 'train_loss': 0.0,
 'wd': 0.02,
 'world_size': 8,
 'wp_ep': 5}

The logs are:

{"name": "res50", "cmd": "--local_rank=0 --exp_name res50  --exp_dir ./output//Spark_res50_official_finetune_default_lr2e-3_ld0_7_300e --data_path=imagenet2012/ImageNet_ILSVRC2012 --model=resnet50 --ep=300 --bs=2048 --resume_from=./resnet50_1kpretrained_timm_style.pth",  "model": "resnet50"}

{"cur_ep": "", "train_L": 0.0, "train_acc": 0.0, "best_val_acc": 0.0, "rema": "", "fini": ""}
{"cur_ep": "1/300", "train_L": 0.15892334485948087, "train_acc": 0.62875, "best_val_acc": 4.741999879479408, "rema": "1 day, 3:49:28", "fini": "04-18 14:07"}
{"cur_ep": "1/300", "train_L": 0.1589116028137505, "train_acc": 0.67875, "best_val_acc": 4.741999879479408, "rema": "1 day, 3:49:55", "fini": "04-18 14:07"}
{"cur_ep": "2/300", "train_L": 0.006166074915975333, "train_acc": 17.9, "best_val_acc": 4.741999879479408, "rema": "23:23:38", "fini": "04-18 09:46"}
{"cur_ep": "2/300", "train_L": 0.006192157221585512, "train_acc": 17.850625, "best_val_acc": 4.741999879479408, "rema": "23:23:11", "fini": "04-18 09:45"}
{"cur_ep": "3/300", "train_L": 0.005531815142929554, "train_acc": 31.03125, "best_val_acc": 4.741999879479408, "rema": "23:10:48", "fini": "04-18 09:37"}
{"cur_ep": "3/300", "train_L": 0.005463680490851402, "train_acc": 31.53375, "best_val_acc": 4.741999879479408, "rema": "23:10:48", "fini": "04-18 09:37"}
{"cur_ep": "4/300", "train_L": 0.005408975677192211, "train_acc": 34.206875, "best_val_acc": 4.741999879479408, "rema": "23:16:55", "fini": "04-18 09:48"}
{"cur_ep": "4/300", "train_L": 0.0053789736032485965, "train_acc": 34.44125, "best_val_acc": 4.741999879479408, "rema": "23:16:55", "fini": "04-18 09:48"}
{"cur_ep": "5/300", "train_L": 0.005379272519052029, "train_acc": 34.675, "best_val_acc": 4.741999879479408, "rema": "23:14:28", "fini": "04-18 09:50"}
{"cur_ep": "5/300", "train_L": 0.005380104035139084, "train_acc": 34.819375, "best_val_acc": 4.741999879479408, "rema": "23:14:28", "fini": "04-18 09:50"}
{"cur_ep": "6/300", "train_L": 0.005402451483905315, "train_acc": 34.683125, "best_val_acc": 43.54200065135956, "rema": "1 day, 0:28:58", "fini": "04-18 11:10"}
{"cur_ep": "6/300", "train_L": 0.00541508878916502, "train_acc": 34.975625, "best_val_acc": 43.54200065135956, "rema": "1 day, 0:29:01", "fini": "04-18 11:10"}
{"cur_ep": "7/300", "train_L": 0.005425414913892746, "train_acc": 35.530625, "best_val_acc": 43.54200065135956, "rema": "22:59:30", "fini": "04-18 09:45"}
{"cur_ep": "7/300", "train_L": 0.005408091994374991, "train_acc": 35.624375, "best_val_acc": 43.54200065135956, "rema": "22:59:30", "fini": "04-18 09:45"}
{"cur_ep": "8/300", "train_L": 0.005414580475538969, "train_acc": 35.54125, "best_val_acc": 43.54200065135956, "rema": "22:52:39", "fini": "04-18 09:43"}
{"cur_ep": "8/300", "train_L": 0.005382590828090906, "train_acc": 35.548125, "best_val_acc": 43.54200065135956, "rema": "22:52:39", "fini": "04-18 09:43"}
{"cur_ep": "9/300", "train_L": 0.005471015560626983, "train_acc": 34.43, "best_val_acc": 43.54200065135956, "rema": "22:54:35", "fini": "04-18 09:50"}
{"cur_ep": "9/300", "train_L": 0.005475558865070343, "train_acc": 34.46375, "best_val_acc": 43.54200065135956, "rema": "22:54:35", "fini": "04-18 09:50"}
{"cur_ep": "10/300", "train_L": 0.005533626601845026, "train_acc": 33.153125, "best_val_acc": 43.54200065135956, "rema": "22:50:06", "fini": "04-18 09:50"}
{"cur_ep": "11/300", "train_L": 0.005674835596978664, "train_acc": 30.861875, "best_val_acc": 43.54200065135956, "rema": "1 day, 0:08:31", "fini": "04-18 11:13"}
{"cur_ep": "11/300", "train_L": 0.00564453030526638, "train_acc": 30.945, "best_val_acc": 43.54200065135956, "rema": "1 day, 0:08:34", "fini": "04-18 11:13"}
{"cur_ep": "12/300", "train_L": 0.005810007998347282, "train_acc": 28.496875, "best_val_acc": 43.54200065135956, "rema": "22:46:05", "fini": "04-18 09:56"}
{"cur_ep": "12/300", "train_L": 0.005785862766951323, "train_acc": 28.413125, "best_val_acc": 43.54200065135956, "rema": "22:46:05", "fini": "04-18 09:56"}
{"cur_ep": "13/300", "train_L": 0.005920035427808761, "train_acc": 25.194375, "best_val_acc": 43.54200065135956, "rema": "22:33:55", "fini": "04-18 09:48"}
{"cur_ep": "13/300", "train_L": 0.005946401672065258, "train_acc": 25.08125, "best_val_acc": 43.54200065135956, "rema": "22:33:55", "fini": "04-18 09:48"}
{"cur_ep": "14/300", "train_L": 0.006212903738021851, "train_acc": 19.32875, "best_val_acc": 43.54200065135956, "rema": "22:36:47", "fini": "04-18 09:56"}
{"cur_ep": "14/300", "train_L": 0.006215040449798107, "train_acc": 19.2525, "best_val_acc": 43.54200065135956, "rema": "22:36:47", "fini": "04-18 09:56"}
{"cur_ep": "15/300", "train_L": 0.006856135655939579, "train_acc": 8.3275, "best_val_acc": 43.54200065135956, "rema": "22:27:26", "fini": "04-18 09:51"}
}
{"cur_ep": "16/300", "train_L": 0.00847176744043827, "train_acc": 0.505, "best_val_acc": 43.54200065135956, "rema": "23:35:44", "fini": "04-18 11:04"}
{"cur_ep": "16/300", "train_L": 0.00849286153614521, "train_acc": 0.4875, "best_val_acc": 43.54200065135956, "rema": "23:36:30", "fini": "04-18 11:05"}
{"cur_ep": "17/300", "train_L": 0.4059472487047315, "train_acc": 0.17875, "best_val_acc": 43.54200065135956, "rema": "21:58:07", "fini": "04-18 09:31"}
{"cur_ep": "17/300", "train_L": 0.4075041047215462, "train_acc": 0.1925, "best_val_acc": 43.54200065135956, "rema": "21:58:55", "fini": "04-18 09:32"}
{"cur_ep": "18/300", "train_L": 2.594160234707594, "train_acc": 0.20125, "best_val_acc": 43.54200065135956, "rema": "21:50:27", "fini": "04-18 09:28"}
{"cur_ep": "18/300", "train_L": 2.599315336358547, "train_acc": 0.19, "best_val_acc": 43.54200065135956, "rema": "21:50:27", "fini": "04-18 09:28"}
{"cur_ep": "19/300", "train_L": 168.27923201017379, "train_acc": 0.213125, "best_val_acc": 43.54200065135956, "rema": "21:47:55", "fini": "04-18 09:30"}
{"cur_ep": "19/300", "train_L": 168.41776486291886, "train_acc": 0.20375, "best_val_acc": 43.54200065135956, "rema": "21:47:55", "fini": "04-18 09:30"}
{"cur_ep": "20/300", "train_L": 9278.141375096131, "train_acc": 0.196875, "best_val_acc": 43.54200065135956, "rema": "21:40:22", "fini": "04-18 09:28"}
{"cur_ep": "20/300", "train_L": 9242.067996807862, "train_acc": 0.225625, "best_val_acc": 43.54200065135956, "rema": "21:40:25", "fini": "04-18 09:28"}
{"cur_ep": "21/300", "train_L": 385708.6738011719, "train_acc": 0.193125, "best_val_acc": 43.54200065135956, "rema": "22:55:53", "fini": "04-18 10:48"}
{"cur_ep": "21/300", "train_L": 384956.72077226563, "train_acc": 0.216875, "best_val_acc": 43.54200065135956, "rema": "22:55:53", "fini": "04-18 10:48"}
{"cur_ep": "22/300", "train_L": 4882009.3119, "train_acc": 0.20625, "best_val_acc": 43.54200065135956, "rema": "21:27:03", "fini": "04-18 09:24"}
{"cur_ep": "22/300", "train_L": 4842194.24495, "train_acc": 0.22, "best_val_acc": 43.54200065135956, "rema": "21:27:06", "fini": "04-18 09:24"}
{"cur_ep": "23/300", "train_L": 51736270.4836, "train_acc": 0.209375, "best_val_acc": 43.54200065135956, "rema": "21:28:39", "fini": "04-18 09:30"}
{"cur_ep": "23/300", "train_L": 52044852.8624, "train_acc": 0.169375, "best_val_acc": 43.54200065135956, "rema": "21:28:42", "fini": "04-18 09:30"}
{"cur_ep": "24/300", "train_L": 122264843.6944, "train_acc": 0.22125, "best_val_acc": 43.54200065135956, "rema": "21:21:00", "fini": "04-18 09:27"}
{"cur_ep": "25/300", "train_L": 462680769.6992, "train_acc": 0.184375, "best_val_acc": 43.54200065135956, "rema": "21:16:28", "fini": "04-18 09:27"}
{"cur_ep": "25/300", "train_L": 464397992.0096, "train_acc": 0.211875, "best_val_acc": 43.54200065135956, "rema": "21:16:30", "fini": "04-18 09:27"}
{"cur_ep": "26/300", "train_L": 1761032724.8896, "train_acc": 0.198125, "best_val_acc": 43.54200065135956, "rema": "22:27:24", "fini": "04-18 10:43"}
{"cur_ep": "26/300", "train_L": 1763697504.5888, "train_acc": 0.188125, "best_val_acc": 43.54200065135956, "rema": "22:27:51", "fini": "04-18 10:43"}
{"cur_ep": "27/300", "train_L": NaN, "train_acc": 0.20375, "best_val_acc": 43.54200065135956, "rema": "20:57:51", "fini": "04-18 09:18"}
{"cur_ep": "27/300", "train_L": NaN, "train_acc": 0.211875, "best_val_acc": 43.54200065135956, "rema": "20:58:21", "fini": "04-18 09:18"}
{"cur_ep": "28/300", "train_L": NaN, "train_acc": 0.19125, "best_val_acc": 43.54200065135956, "rema": "20:49:07", "fini": "04-18 09:14"}
keyu-tian commented 1 year ago

i see, i would check that. Perhaps I copied the wrong code of lamb optimizer.

BTW, have you tried a ConvNeXt-small? Would it fail too?

Vickeyhw commented 1 year ago

@keyu-tian ConvNeXt-small seems normal so far.

Vickeyhw commented 1 year ago

@keyu-tian Have you found the resnet-50 fine-tuning problem? The ConvNeXt-small reaches 83.96 validation acc fine-tuning from your released pertaining weights.