Closed raymin0223 closed 1 year ago
Sorry for the late reply.
Q1: I tried to re-train our model on MSVD-QA. The "best test" is around 53.4 (at epoch 6, batch_size 20).
Q2: I have not tried 10 frames, but I think it may be reasonable. Since we adopt 4 frames during pre-training, the downstream fine-tuning should be similar.
Thanks @tsujuifu! All my questions have been solved.
Hi @tsujuifu, thanks again for your great work.
As I ran the script for MSVD-QA downstream task, I got the following results (Best test 51.49), which is lower than 54.6.
Q1. Do you have any idea what I missed? I didn't change any argument in
_args.args_msvd-qa.json
, and use the command likeCUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node=4 --master_port=5566 main_qaoe_tsv_mlm_head.py --config _args/args_msvd-qa.json
.Q2. Increasing the frame size Also, when I increased
size_frame
argument to 10, the downstream performance was lower than using just 5 frames. Is it the expected result? As I'm the beginner in this field, I would like to ask your insight.