HuangChiEn commented 1 year ago

Firstly, thanks for your release of such an amazing framework, which almost covers all of the SOTA-SSL methods.

Do you mind to look into why the performance of MAE does not match the paper results in ImageNet? This paper recorded 82.1% top-1 acc for 100 epoch pretraining on ViT-base architecture with 4096 batch size on ImageNet dataset (100 epoch for fine-tuning).

They mention the results come from performing the official code on 100 epoch / 300 epoch and 1600 epoch. For the 300 epoch and 1600 epoch, we also find the accuracy is matched with the other paper, so we think the 100 epoch results is also verified.

On the other hand, we use the solo-lean of this version to run the mae pretraining and the pretraining configuration as well as the procedure tracing can be found in the link.

We keep the exactly same configuration, but the resulting performance of top-1 accuracy is only 77.4%, which is lower then the aforementioned 82.1% about 4%, and i think it's largely beyond the random seed and acceptable variance of experiments.

In addition, the fine-tuning configuration as well as the finetuning tracing can be found in the link.

All the above configurations of finetuning match the pretraining script of official implementation (except the num of epochs).
Any suggestion is appreciated !!

vturrisi commented 1 year ago

Hey,

Thanks for letting us know about this. I'm a bit busy until probably next year, but I'll try to check before then. Nonetheless, try to see if the parameters that we have are the same as in the original paper as we might have missed some.

I'll try to check it myself as soon as I can.

HuangChiEn commented 1 year ago

thanks for your help ~ We're also try to figure out this issue, i just keep this issue open in here

HuangChiEn commented 1 year ago

Hey,

Thanks for letting us know about this. I'm a bit busy until probably next year, but I'll try to check before then. Nonetheless, try to see if the parameters that we have are the same as in the original paper as we might have missed some.

I'll try to check it myself as soon as I can.

Newly update :

Hello, this several week, we found that the start_lr (or warmup_start_lr in my config) should be setup to exactly 0 in fine-tuning stage, this modification will increase about 2% top-1 acc (from 77.4%). Since the default setup is a small number instead of zero. So, the reproduced version of MAE in solo-learn could achieved 79.6% top-1 acc in ImageNet for 100ep pertaining (100ep fine-tuning) currently.

The wandb link (the other run is deleted) are provided for pretraining and fine-tuning, respectively.

However, the official code released accuracy is 82.1%, still have some hyparameters need to be tuned. Any suggestion will be appreciated!!

vturrisi commented 1 year ago

Glad to hear that. I still haven't found the time to look into it.

One experiment that might be very interesting (resources permit) is see how much a model pretrained with the official code differs from a model pretrained on solo. I would advise to pretrain a model with the official and then run our finetune, in that way, we can know of the problem is the pretraining or the fine-tuning.

DonkeyShot21 commented 1 year ago

Another thing to consider is that MAE is very sensitive to the jpeg decoding library that you use. So for instance if you are using pillow simd you can expect a ~1% loss in accuracy wrt normal pillow.

HuangChiEn commented 1 year ago

Glad to hear that. I still haven't found the time to look into it.

One experiment that might be very interesting (resources permit) is see how much a model pretrained with the official code differs from a model pretrained on solo. I would advise to pretrain a model with the official and then run our finetune, in that way, we can know of the problem is the pretraining or the fine-tuning.

TL; DR (option to read) Thanks for your reply, these several weeks our team also provide some interesting testing for the official code.

One of our member re-run the 100ep pertaining based on the official MAE 1600ep-config on the official code, and we only got 80.62%. At the same time, he also found a paper report 100ep pertaining accuracy 81.2%. So, we can believe that the report of 82.1% top-1 acc must have some params tuning on it.

On the other hand, the accuracy between 80.62% and 81.2% could be counted as the effect of random seed setup. (variance : 0.6%). So, we consider the 80.62% could be the acceptable reproduced results. while the 1% accuracy worse maybe the last issue we need to think about.

Thanks for your helping, if you have free time for survey this issue we still wait for that and remaining this issue. If we figure out some part, we'll also provide some information of imagenet 100ep config and close this issue ~

HuangChiEn commented 1 year ago

pillow simd @DonkeyShot21 Thanks for participate this issue, we're also thanks for that !!

For speedup the data augmentation, we applied the dali, which indeed decode jpeg with some special function.

So, do you suggest that we can disable the dali and apply the image_folder setup for pretrain and fine-tune stage to get the 1% acc increasing ? ( this also make sense, since the official code did not use dali to speedup the image loading..

zeyuyun1 commented 1 year ago

@HuangChiEn Hi, I wonder if you also tried to benchmark MAE on cifar10. I ran the training script and using the config file in the repo and got 83% top1 evaluation accuracy. Does this look right?

HuangChiEn commented 1 year ago

@HuangChiEn Hi, I wonder if you also tried to benchmark MAE on cifar10. I ran the training script and using the config file in the repo and got 83% top1 evaluation accuracy. Does this look right?

I'm sorry, but we don't have any experience with pretrain/finetune MAE on cifar10.

However, I think it's a bit lower than expected if you have pretrain & fine-tune it. Also in a practical view, the small-scale supervised trained model can easily surpass this accuracy, so it may not be the target of SSL research.

Besides, I believe the contributor of solo-learn has already provided a well-tuned config for cifar10/cifar100 in here. Did you run the exp according to this configuration? If you have your own, could you also present it in wandb ?

zeyuyun1 commented 1 year ago

Yes. This is config I used to pretrain MAE. This is the run in wandb: https://wandb.ai/chobitstian/solo-learn. It's the first one with name "mae-cifar10", please ignore all the other runs.

HuangChiEn commented 1 year ago

Yes. This is config I used to pretrain MAE. This is the run in wandb: https://wandb.ai/chobitstian/solo-learn. It's the first one with name "mae-cifar10", please ignore all the other runs.

I have quickly scane your pretraining config, may i suggest that may be you can follow the configuration given by solo-learn, which i believe is well-tested.

For example, the warmup-epoch is mismatched, while you setup for 10, but the config is 40.

Also, note that this version solo-learn config is more straightforward..

Since seldom benchmarks directly runs for cifar-10 pretrain and then fintune, i also can not judge this accuracy. However, i believe it should be nearby MoCoV3 performance (93.10/99.80). At least, DeepCluster V2 is the lower bound (88.85/99.58).

zeyuyun1 commented 1 year ago

Uh. That's interesting. The current config (the one you mentioned before) didn't mention warmup-epoch.

I think it might also cause by the effect of DDP? When you simply increase number of GPUs, the effect batch size become larger, but I don't think the current code is adjusting the learning rate for that. So I think I will try single gpu and the old config file you suggested. I will let you know the result.

Thanks for the help!

HuangChiEn commented 1 year ago

Good morning, thanks to DonkeyShot21 suggestion, we're glad to find the suitable configuration for MAE both in pretaining and fine-tuning. The finally accuracy could achieve 81.6% top-1 acc, 95.5% top-5 acc (yes, solo-learn can works slightly better then official code with same config).

The following wandb link provides the detail configuration of pretrain on MAE with 100ep in ImageNet dataset : pretraining finetuneing

@DonkeyShot21 @vturrisi If you'll don't mind, this configuration can also be applied in solo-learn and record the accuracy for ImageNet for the issue https://github.com/vturrisi/solo-learn/issues/153.

While I also provide the configuration with easy_configer format :

pretraining 100 ep on ImageNet : MAE
# Note : follow EasyCV cfg run 100ep, since 400ep cfg also follow 1600ep with modifiying epoch.
# 1600ep : https://github.com/alibaba/EasyCV/blob/master/configs/selfsup/mae/mae_vit_base_patch16_8xb64_1600e.py
seed = 42@int

[data_cfg] dataset = imagenet@str

Note : .h5 dataset may process faster..

train_data_path = {'path':'/data/imgnet/train'}@Path   # do not forgot to regist the Path class of cfger
val_data_path = {'path':'/data/imgnet/val'}@Path
data_fraction = -1.0@float
data_format = image_folder@str
num_workers = 8@int

[model_cfg] method = mae@str backbone = vit_base@str decoder_embed_dim = 512@int decoder_depth = 8@int decoder_num_heads = 16@int mask_ratio = 0.75@float

[train_cfg] batch_size = 512@int # 512b effictive batch_size upto 4096 (original paper) max_epochs = 100@int precision = 16@int

[optmz_cfg] optimizer = adamw@str adamw_beta1 = 0.9@float adamw_beta2 = 0.95@float lr = 1.5e-4@float
classifier_lr = 1.5e-4@float weight_decay = 0.05@float scheduler = warmup_cosine@str warmup_epochs = 40@int warmup_start_lr = 0.00000@float

[gpu_cfg] devices = 0, 1@str accelerator = gpu@str strategy = ddp@str accumulate_grad_batches = 4@int dali_device = gpu@str

[wandb_cfg] name = wodali-mae-vitb-pt100ep-baseline@str entity = josef@str project = MixSim@str

[trfs_cfg] num_crops_per_aug = [1]@list brightness = [0]@list contrast = [0]@list
saturation = [0]@list hue = [0]@list gray_scale_prob = [0]@list gaussian_prob = [0]@list
solarization_prob = [0]@list
min_scale = [0.2]@list

[store_true] wandb = True@bool sync_batchnorm = True@bool save_checkpoint = True@bool norm_pix_loss = True@bool

# other flag for store_true
debug_augmentations = True@bool   # transformation debug..
no_labels = False@bool             # for custom data only..
auto_resume = False@bool
auto_umap = False@bool


fine-tuning 100ep

seed = 42@int

[data_cfg] dataset = imagenet@str train_data_path = {'path':'/data/imgnet/train'}@Path # do not forgot to regist the Path class of cfger val_data_path = {'path':'/data/imgnet/val'}@Path data_fraction = -1.0@float data_format = image_folder@str
num_workers = 12@int

[model_cfg] method = mae@str backbone = vit_base@str

[train_cfg] batch_size = 256@int # effictive batch_size : 1024 max_epochs = 100@int precision = 16@int

[finetune_method] pretrain_method = vit@str

fixup-cfg : 0.75

layer_decay = 0.65@float     
label_smoothing = 0.1@float
mixup = 0.8@float
cutmix = 1.0@float
drop_path = 0.1@float

[optmz_cfg]

5e-4 cfg for torch-implement : https://github.com/facebookresearch/mae/blob/main/FINETUNE.md

lr = 5e-4@float               
weight_decay = 0.05@float
optimizer = adamw@str
adamw_beta1 = 0.9@float
adamw_beta2 = 0.999@float
scheduler = warmup_cosine@str
warmup_epochs = 5@int
warmup_start_lr = 0.00000@float

[gpu_cfg] devices = 0, 1, 4, 5@str accelerator = gpu@str strategy = ddp@str dali_device = gpu@str

[wandb_cfg] name = test_mae-vitb-ft100ep-baseline-pt100ep@str entity = josef@str project = MixSim@str

[ckpt_cfg] pretrained_feature_extractor = /workspace/scripts/trained_models/mae/2vafj22o/wodali-mae-vitb-pt100ep-baseline-2vafj22o-ep=99.ckpt@str checkpoint_dir = {'path':'/workspace/scripts/trained_models'}@Path # you can customized ckpt path checkpoint_frequency = 10@int # (how many epoch)

[store_true] finetune = True@bool wandb = True@bool save_checkpoint = True@bool auto_resume = False@bool sync_batchnorm = True@bool


@zeyuyun1 
I'll close this issue in 2 days, you can open a new issue for your question, if you still find the accuracy not match on the cifar10 dataset.

zeyuyun1 commented 1 year ago

Sorry I am little confused about your comment. You said "The finally accuracy could achieve 81.6% top-1 acc, 95.5% top-5 acc," is this for cifar10? 81.6% top-1 acc is pretty low right?

HuangChiEn commented 1 year ago

Oh.. www ~ (81.6% top-1 acc, 95.5% top-5 acc) it's ImageNet accuracy actually..

yeah, i think cifar10 should be higher then ~

vturrisi commented 1 year ago

@HuangChiEn thanks for providing that! So the issue was fixed by using default image folder instead of Dali? I'll convert your config to our configuration format and open a PR. Can you also provide the pretrained and finetuned checkpoints?

HuangChiEn commented 1 year ago

@HuangChiEn thanks for providing that! So the issue was fixed by using default image folder instead of Dali? I'll convert your config to our configuration format and open a PR. Can you also provide the pretrained and finetuned checkpoints?

Yes, i think we only modify the following part :

we disable the dali with setting image_folder mode in both pertaining and fine-tuning stage. (personally thinking, wo_dali is more useful for pertaining mode, finetuning may only decrease acc slightly..)
we fixed the warmup_start_lr with exactly zero (instead of 3e-5) in fine-tuning stage. Besides, we also set warmup_start_lr with zero in pertaining stage to align with MAE official code.

However, i think the default setting (3e-5) for pretraining stage may also increase the acc. (I have did that exp and see the slightly increasing acc in pretraining, but i interrupted in few epoch without running the whole 100ep)

About the checkpoint, we can provide it. But it may take a while to prepare. Then I think I'll paste the google drive link in here, then close the issue ~

vturrisi commented 1 year ago

Sure. I'll update the config files. Thanks for the help. Let me know when you have the checkpoints and I'll add the results/checkpoint to the readme.

HuangChiEn commented 1 year ago

Sure. I'll update the config files. Thanks for the help. Let me know when you have the checkpoints and I'll add the results/checkpoint to the readme.

Morning, the resulting checkpoint could be found in the following google drive link: 🔗 pretraining 🔗 fine-tuning

let this issue open 2 days, if you encounter any issue about downloading or can not find the ckpt,... ,etc. plz tag me ~

vturrisi commented 1 year ago

@HuangChiEn Added the checkpoints to our zoo and added the results in #321.

HuangChiEn commented 1 year ago

@HuangChiEn Added the checkpoints to our zoo and added the results in #321.

everything looks good ~ issue closed..

vturrisi commented 1 year ago

@HuangChiEn Thanks again for debugging it for us and providing the checkpoints/results :)

vturrisi / solo-learn

About the performance of MAE not match with the paper results in ImageNet #313

I'll try to check it myself as soon as I can.

Newly update :

pillow simd @DonkeyShot21 Thanks for participate this issue, we're also thanks for that !!

pretraining 100 ep on ImageNet : MAE

Note : .h5 dataset may process faster..

fixup-cfg : 0.75

5e-4 cfg for torch-implement : https://github.com/facebookresearch/mae/blob/main/FINETUNE.md