wjun0830 / QD-DETR

Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
https://arxiv.org/abs/2303.13874
Other
199 stars 15 forks source link

Can not completely reproduce reported video-only results on QVHighlights with the default configs #15

Closed medivhna closed 1 year ago

medivhna commented 1 year ago

Hello, I tried to reproduce the video-only results with this official source codes, but got a weaker results as follows:

test_MR-full-R1@0.5: 59.99 test_MR-full-R1@0.7: 41.31 test_MR-full-mAP: 36.60 test_MR-full-mAP@0.5: 60.45 test_MR-full-mAP@0.75: 35.78 test_MR-long-mAP: 44.37 test_MR-middle-mAP: 36.94 test_MR-short-mAP: 7.32 test_HL-min-VeryGood-mAP: 38.56 test_HL-min-Good-mAP: 63.94 test_HL-min-Good-Hit1: 73.28 test_HL-min-Fair-mAP: 74.76 test_HL-min-VeryGood-Hit1: 61.54 test_HL-min-Fair-Hit1: 75.10 val_MR-full-R1@0.5: 61.94 val_MR-full-R1@0.7: 44.06 val_MR-full-mAP: 38.86 val_MR-full-mAP@0.5: 61.13 val_MR-full-mAP@0.75: 39.06 val_MR-long-mAP: 44.46 val_MR-middle-mAP: 41.29 val_MR-short-mAP: 7.31 val_HL-min-VeryGood-mAP: 39.33 val_HL-min-Good-mAP: 63.67 val_HL-min-Good-Hit1: 73.42 val_HL-min-Fair-mAP: 74.61 val_HL-min-VeryGood-Hit1: 62.90 val_HL-min-Fair-Hit1: 75.42

train.sh script is nearly the same as given:

dset_name=hl
ctx_mode=video_tef
v_feat_types=slowfast_clip
t_feat_type=clip 
results_root=results
exp_id=bs32_baseline

######## data paths
train_path=data/highlight_train_release.jsonl
eval_path=data/highlight_val_release.jsonl
eval_split_name=val

######## setup video+text features
feat_root=./features

# video features
v_feat_dim=0
v_feat_dirs=()
if [[ ${v_feat_types} == *"slowfast"* ]]; then
  v_feat_dirs+=(${feat_root}/slowfast_features)
  (( v_feat_dim += 2304 ))  # double brackets for arithmetic op, no need to use ${v_feat_dim}
fi
if [[ ${v_feat_types} == *"clip"* ]]; then
  v_feat_dirs+=(${feat_root}/clip_features)
  (( v_feat_dim += 512 ))
fi

# text features
if [[ ${t_feat_type} == "clip" ]]; then
  t_feat_dir=${feat_root}/clip_text_features/
  t_feat_dim=512
else
  echo "Wrong arg for t_feat_type."
  exit 1
fi

#### training
bsz=32

CUDA_VISIBLE_DEVICES=7 PYTHONPATH=$PYTHONPATH:. python qd_detr/train.py \
--dset_name ${dset_name} \
--ctx_mode ${ctx_mode} \
--train_path ${train_path} \
--eval_path ${eval_path} \
--eval_split_name ${eval_split_name} \
--v_feat_dirs ${v_feat_dirs[@]} \
--v_feat_dim ${v_feat_dim} \
--t_feat_dir ${t_feat_dir} \
--t_feat_dim ${t_feat_dim} \
--bsz ${bsz} \
--results_root ${results_root} \
--exp_id ${exp_id} \
${@:1}

My GPU is NVIDIA A100, PyTorch version is 1.12. What's the matter with my reproduction?

wjun0830 commented 1 year ago

Hello. Could you try the versions in the requirements.txt?

Although it is possible that the implementation may have slightly changed while reorganizing the codes, our checkpoint provided in this repository is obtained with the publicized version. If you haven't changed any codes, it may be because of different machine or different library versions.

medivhna commented 1 year ago

Hello. Could you try the versions in the requirements.txt?

Although it is possible that the implementation may have slightly changed while reorganizing the codes, our checkpoint provided in this repository is obtained with the publicized version. If you haven't changed any codes, it may be because of different machine or different library versions.

OK, I will try on the requirements.txt dependency and give the feedback as soon as possible.