xyzforever / BEVT

PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529
Apache License 2.0
157 stars 19 forks source link

Accuracy remains around 0.6% after 10 epoches #6

Closed JINXvvv closed 2 years ago

JINXvvv commented 2 years ago

I have loaded pretrained model you provided,but the accuracy doesn't rise after 10 epoches

cddlyf commented 2 years ago

@JINXvvv which accuracy do you mean? MIM acc or downstream task acc?

JINXvvv commented 2 years ago

加载win_base_image_stream_pretrain.pth进行图像视频联合预训练,在32个epoch的时候mask_acc_2d只有3.5%左右,mask_acc_3d有10%左右,看loss曲线的情况loss_cls_2d没有下降趋势了,loss_cls_3d在缓慢下降请问是正常现象吗

xyzforever commented 2 years ago

加载win_base_image_stream_pretrain.pth进行图像视频联合预训练,在32个epoch的时候mask_acc_2d只有3.5%左右,mask_acc_3d有10%左右,看loss曲线的情况loss_cls_2d没有下降趋势了,loss_cls_3d在缓慢下降请问是正常现象吗

@JINXvvv This is normal, according to our experimental data. In the two stream joint pretraining, we use the loss of the masked image modeling task to preserve spatial knowledge learned from image stream pretraining. Therefore, during two stream pretraining, the convergence of MIM loss is faster than that of MVM loss (because the model has been trained with the MIM task during the image stream pretraining).

JINXvvv commented 2 years ago

那请问正常预训练情况下150个epoch后mask_acc_2d和mask_acc_3d分别能够达到多少呢,我复现的结果mask_acc_2d只有4%,mask_acc_3d是13%左右

xyzforever commented 2 years ago

那请问正常预训练情况下150个epoch后mask_acc_2d和mask_acc_3d分别能够达到多少呢,我复现的结果mask_acc_2d只有4%,mask_acc_3d是13%左右

@JINXvvv This is consisent with our experiments if you mean the "mask_acc" shown in our training logs. The mask_acc shown in our training logs is the MIM (or MVM) accuracy of the current batch.