Closed zxccade closed 2 months ago
I couldn't reproduce the results reported in the paper on the Charades dataset with the SF+CLIP features and default hyperparameters.
Hello! Sorry for being late due to our busy schedule for the rebuttal period.
Actually, we are very sorry that we currently only have the config file for QVHighlights for QD-DETR. We suggest you try tuning the ['learning rate \in 1e-4, 2e-4', 'saliency loss ratio \in 1, 4'].
We also find that the results are not consistent on different machines due to some reason with our codebase for somewhat reason (we haven't figured out why). So we recommend trying different parameters on your specific machine.
Thanks for getting back to me in your busy schedule,
I've tried the parameter you mentioned, but the model could only converge to 45 at R1@0.5, which is far from the records in the paper. However, when I tried I3D features instead of SF+CLIP, the model could converge to 53 at R1@0.5, which is close to the records in the paper.
I guess I need to change some parameters. It would be better if you could help me find the opt.json file after your busy rebuttal period. Thanks.
@wjun0830 @zxccade Hi, I also cannot reproduce the experiments with Charades-STA with the default parameter. The results are here, R1@0.5=0.5 and far from 0.57 in the paper. Could you share the hyper-parameters?
"MR-full-R1@0.5": 51.88,
"MR-full-R1@0.7": 27.93,
"MR-full-mAP": 30.26,
"MR-full-mAP@0.5": 60.92,
"MR-full-mAP@0.75": 25.31
We are again very sorry that our codebase is not very robust to different server settings.
Have you tried changing params as above? You can also refer to Params for succeeding paper can be found in the appendix of https://arxiv.org/abs/2311.08835.
Thank you for reply! I will try the hyper-parameters used in CG-DETR, let me take several hours to test it...
@wjun0830 Sorry for bothering you again. They are hyper-parameters in CG-DETR. Except for CG-DETR-specific parameters, you used these parameters? Several differences exist from QD-DETR. For example, enc_layers=3, dec_layers=3, --lr 0.0002, --lw_saliency=4 are set (In QD-DETR, enc_layers=2, dec_layers=2, lr=0.0001, lw_saliency=1.0).
#### training
bsz=32
eval_bsz=32
num_dummies=45
num_prompts=2
total_prompts=10
lr_drop=400
enc_layers=3
dec_layers=3
t2v_layers=2
dummy_layers=2
moment_layers=1
sent_layers=1
PYTHONPATH=$PYTHONPATH:. python cg_detr/train.py \
--dset_name ${dset_name} \
--ctx_mode ${ctx_mode} \
--train_path ${train_path} \
--eval_path ${eval_path} \
--eval_split_name ${eval_split_name} \
--v_feat_dirs ${v_feat_dirs[@]} \
--v_feat_dim ${v_feat_dim} \
--t_feat_dir ${t_feat_dir} \
--t_feat_dim ${t_feat_dim} \
--bsz ${bsz} \
--results_root ${results_root} \
--exp_id ${exp_id} \
--max_v_l -1 \
--clip_length 1 \
--lr 0.0002 \
--lr_drop ${lr_drop} \
--n_epoch 200 \
--contrastive_align_loss_coef 0.002 \
--lw_saliency 4 \
--enc_layers ${enc_layers} \
--dec_layers ${dec_layers} \
--t2v_layers ${t2v_layers} \
--moment_layers ${moment_layers} \
--dummy_layers ${dummy_layers} \
--sent_layers ${sent_layers} \
--eval_bsz ${eval_bsz} \
--num_dummies ${num_dummies} \
--num_prompts ${num_prompts} \
--total_prompts ${total_prompts} \
${@:1}
I remember that we havent changed the number of layers in QD detr. Modification of number layers were implemented only in cg detr following works in iccv. I remember lwsaliency and lr are the changes we have tuned for charades
Thank you for your help. With lr_drop=40, lr 0.0002, lw_saliency=4, we finally reproduce the paper results.
Hi,
May I ask what the hyperparameters settings are when using SF+CLIP features on Charades-STA? Could you please provide the opt.json file?