keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.41k stars 82 forks source link

Finetuning epochs #52

Closed ds2268 closed 11 months ago

ds2268 commented 12 months ago

Dear authors,

with regards to U2 from the reviewer 71p6: "The sparse convolution seems cannot largely speed up the training process like MAE for ViT and the model requires 300 epoch fine-tuning that is the same as the configuration of training from scratch."

I think that the reviewer had in mind that for MAE, the authors only fine-tuned for 100 (ViT-B) and 50 (ViT-L/H) epochs (MAE paper, A.1, Table 9). The supervised models that were trained from scratch in the MAE paper were trained for 300 epochs (MAE paper, A.2, Table 11).

SparK was fine-tuned for 300 epochs, which is the same configuration as training supervised ViTs from scratch. MAE thus achieved comparable performance with 3-6x less fine-tuning epochs.

Can you report your results also with the same fine-tuning configuration (50/100 epochs) as used in MAE? Only then the comparison can be fair.

Comment from: https://openreview.net/forum?id=NRxydtWup1S

ds2268 commented 12 months ago

Partially answered in #40.

keyu-tian commented 11 months ago

Yes in #40 we explained some. We didn't tune much on finetuning like convnextv2, and just copied the recipes from AAMIM or RSB. Fewer epochs (with hyperparameter searching if necessary) are worty to try, but currently we do not have enough computational resources. Maybe one can use our checkpoint and codebase to do that.