I'm retraining from scratch EsVIT on a custom dataset (1.7M images) with tiny swin, W=14, and a batch size of 64, default lr and wd, and the following hp
--teacher_temp 0.04 \
--warmup_teacher_temp 0.03 \
--momentum_teacher 0.9996 \
--warmup_epochs 10 \
--warmup_teacher_temp_epochs 30 \
--use_dense_prediction True \
--use_fp16 True \
--out_dim 65536 \
--epochs 300 \
The loss does not decrease from epoch 70 onwards.
Which hp would you recommend tuning now resuming from let's say epoch 70 ?
Hi,
I'm retraining from scratch EsVIT on a custom dataset (1.7M images) with tiny swin, W=14, and a batch size of 64, default lr and wd, and the following hp --teacher_temp 0.04 \ --warmup_teacher_temp 0.03 \ --momentum_teacher 0.9996 \ --warmup_epochs 10 \ --warmup_teacher_temp_epochs 30 \ --use_dense_prediction True \ --use_fp16 True \ --out_dim 65536 \ --epochs 300 \
The loss does not decrease from epoch 70 onwards.
Which hp would you recommend tuning now resuming from let's say epoch 70 ?
Thanks