args used for first table in README

tileb1 commented 3 years ago

Hello, Could you please provide the args used for running main_esvit.py with the right arguments for each run in the table below (first table in README)? Are the args used different for each entry?

EsViT (Swin) with network configurations of increased model capacities, pre-trained with both view-level and region-level tasks. ResNet-50 trained with both tasks is shown as a reference.

arch	params	linear	k-nn	download	logs
ResNet-50	23M	75.7%	71.3%	full ckpt	train	linear	knn
EsViT (Swin-T, W=7)	28M	78.0%	75.7%	full ckpt	train	linear	knn
EsViT (Swin-S, W=7)	49M	79.5%	77.7%	full ckpt	train	linear	knn
EsViT (Swin-B, W=7)	87M	80.4%	78.9%	full ckpt	train	linear	knn
EsViT (Swin-T, W=14)	28M	78.7%	77.0%	full ckpt	train	linear	knn
EsViT (Swin-S, W=14)	49M	80.8%	79.1%	full ckpt	train	linear	knn
EsViT (Swin-B, W=14)	87M	81.3%	79.3%	full ckpt	train	linear	knn

Thank you!

ChunyuanLI commented 3 years ago

Good question! You may find the args we used for each run in the released full ckpt, by loading the each checkpoint and checking the key args.

In general, we tuned very little to produce the reported results across different runs, so the hyper-parameter settings are similar in different configurations. For example, one typical hyper-parameter setting is (loading the released checkpoint of EsViT (Swin-T, W=7), and printing the dictionary item args):

Namespace(arch='swin_tiny', batch_size_per_gpu=32, cfg='experiments/imagenet/swin/swin_tiny_patch4_window7_224.yaml', clip_grad=3.0, data_path='/msrhyper-weka/public/penzhan/oscar/phillytools/data/sasa/imagenet/2012', dist_url='env://', epochs=300, freeze_last_layer=1, global_crops_scale=(0.4, 1.0), gpu=0, local_crops_number=8, local_crops_scale=(0.05, 0.4), local_rank=0, lr=0.0005, min_lr=1e-06, momentum_teacher=0.996, norm_last_layer=False, num_workers=10, optimizer='adamw', opts=[], out_dim=65536, output_dir='/mnt/output_storage/dino_exp/swin//swin_tiny/bl_lr0.0005_gpu16_bs32_dense_multicrop_epoch300', patch_size=16, rank=0, saveckp_freq=20, seed=0, teacher_temp=0.07, use_bn_in_head=False, use_dense_prediction=True, use_fp16=True, warmup_epochs=10, warmup_teacher_temp=0.04, warmup_teacher_temp_epochs=30, weight_decay=0.04, weight_decay_end=0.4, world_size=16, zip_mode=True)

tileb1 commented 3 years ago

Ah yes, didn't think of loading from the checkpoint... Thanks!

shallowtoil commented 3 years ago

Hi, @ChunyuanLI. I have been trying to download the checkpoint to load the pre-training args, but the download speed was extremely slow and the download often failed halfway. Could you please kindly share the args in separate links?

microsoft / esvit

args used for first table in README #10