wjun0830 / QD-DETR

Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
https://arxiv.org/abs/2303.13874
Other
183 stars 13 forks source link

The hyper parameters of using SF+CLIP features on Charades-STA #36

Closed zxccade closed 2 months ago

zxccade commented 5 months ago

Hi,

May I ask what the hyperparameters settings are when using SF+CLIP features on Charades-STA? Could you please provide the opt.json file?

zxccade commented 5 months ago

I couldn't reproduce the results reported in the paper on the Charades dataset with the SF+CLIP features and default hyperparameters.

wjun0830 commented 5 months ago

Hello! Sorry for being late due to our busy schedule for the rebuttal period.

Actually, we are very sorry that we currently only have the config file for QVHighlights for QD-DETR. We suggest you try tuning the ['learning rate \in 1e-4, 2e-4', 'saliency loss ratio \in 1, 4'].

We also find that the results are not consistent on different machines due to some reason with our codebase for somewhat reason (we haven't figured out why). So we recommend trying different parameters on your specific machine.

zxccade commented 5 months ago

Thanks for getting back to me in your busy schedule,

I've tried the parameter you mentioned, but the model could only converge to 45 at R1@0.5, which is far from the records in the paper. However, when I tried I3D features instead of SF+CLIP, the model could converge to 53 at R1@0.5, which is close to the records in the paper.

I guess I need to change some parameters. It would be better if you could help me find the opt.json file after your busy rebuttal period. Thanks.

awkrail commented 2 months ago

@wjun0830 @zxccade Hi, I also cannot reproduce the experiments with Charades-STA with the default parameter. The results are here, R1@0.5=0.5 and far from 0.57 in the paper. Could you share the hyper-parameters?

"MR-full-R1@0.5": 51.88,
"MR-full-R1@0.7": 27.93,
"MR-full-mAP": 30.26,
"MR-full-mAP@0.5": 60.92,
"MR-full-mAP@0.75": 25.31
wjun0830 commented 2 months ago

We are again very sorry that our codebase is not very robust to different server settings.

Have you tried changing params as above? You can also refer to Params for succeeding paper can be found in the appendix of https://arxiv.org/abs/2311.08835.

awkrail commented 2 months ago

Thank you for reply! I will try the hyper-parameters used in CG-DETR, let me take several hours to test it...

awkrail commented 2 months ago

@wjun0830 Sorry for bothering you again. They are hyper-parameters in CG-DETR. Except for CG-DETR-specific parameters, you used these parameters? Several differences exist from QD-DETR. For example, enc_layers=3, dec_layers=3, --lr 0.0002, --lw_saliency=4 are set (In QD-DETR, enc_layers=2, dec_layers=2, lr=0.0001, lw_saliency=1.0).

#### training
bsz=32
eval_bsz=32
num_dummies=45
num_prompts=2
total_prompts=10
lr_drop=400
enc_layers=3
dec_layers=3
t2v_layers=2
dummy_layers=2
moment_layers=1
sent_layers=1

PYTHONPATH=$PYTHONPATH:. python cg_detr/train.py \
--dset_name ${dset_name} \
--ctx_mode ${ctx_mode} \
--train_path ${train_path} \
--eval_path ${eval_path} \
--eval_split_name ${eval_split_name} \
--v_feat_dirs ${v_feat_dirs[@]} \
--v_feat_dim ${v_feat_dim} \
--t_feat_dir ${t_feat_dir} \
--t_feat_dim ${t_feat_dim} \
--bsz ${bsz} \
--results_root ${results_root} \
--exp_id ${exp_id} \
--max_v_l -1 \
--clip_length 1 \
--lr 0.0002 \
--lr_drop ${lr_drop} \
--n_epoch 200 \
--contrastive_align_loss_coef 0.002 \
--lw_saliency 4 \
--enc_layers ${enc_layers} \
--dec_layers ${dec_layers} \
--t2v_layers ${t2v_layers} \
--moment_layers ${moment_layers} \
--dummy_layers ${dummy_layers} \
--sent_layers ${sent_layers} \
--eval_bsz ${eval_bsz} \
--num_dummies ${num_dummies} \
--num_prompts ${num_prompts} \
--total_prompts ${total_prompts} \
${@:1}
wjun0830 commented 2 months ago

I remember that we havent changed the number of layers in QD detr. Modification of number layers were implemented only in cg detr following works in iccv. I remember lwsaliency and lr are the changes we have tuned for charades

awkrail commented 2 months ago

Thank you for your help. With lr_drop=40, lr 0.0002, lw_saliency=4, we finally reproduce the paper results.