showlab / UniVTG

[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
https://arxiv.org/abs/2307.16715
MIT License
315 stars 28 forks source link

Training Detail for Fine-tuning? #30

Closed yhl2018 closed 10 months ago

yhl2018 commented 11 months ago

1698998777982 Thanks for your work, some training details are not too clearly described in the readme, so I would like to ask, what are the downstream tasks, training parameters and corresponding training methods?

QinghongLin commented 11 months ago

Please click the google drive, and then you will see the config with their checkpoints, where the config opt.json records all parameters. btw, all parameters are also reported in our paper supplementary.

yhl2018 commented 11 months ago

Thank you for your response, but I found that in the process of looking at downstream tasks, such as the ego4d(NLQ) dataset,opt.json metadata 1699081667081 download data 1699081730979 do not match

QinghongLin commented 11 months ago

Hi The first figure is for the NLQ downstream fine-tuning, thus require the nlq_train/val.jsonl; while the second figure point_egoclip_wo_val.jsonl is for large-scale point-wise pretraining. you should download the nlq_train/val.jsonl from here:

image image
yhl2018 commented 11 months ago

I see, thanks for your reply

yhl2018 commented 11 months ago

NLQ without pretraining,The reference configuration file is: 1699145322580 best result:5.68,3.2,1.16 ,Compared with the results in the paper, the results are still very different。without pretraining and with pretraining are the parameters different? 1699145473082 1699145234599

QinghongLin commented 11 months ago

Yes, the parameter settings are different under two settings. For best result, pls first try different f, and then fix f try different s1, and then fix try different s2.

I will update the parameters without pt later.

yhl2018 commented 11 months ago

Thank you for your prompt reply and look forward to seeing the best mode parameters

QinghongLin commented 11 months ago

@yhl2018 Hi can you attach the complete bash you running for this exp? In my part, my parameter setting (w/o PT) f50_b10g1_s0.1_1 recevie 7.28 R1@0.3 in NLQ.

yhl2018 commented 10 months ago

NLQ without pretraining: ''’#!/bin/bash

SBATCH --job-name=qvhl

SBATCH --output=/fsx/qinghonglin/univtg/log/qvhl_ft.log

SBATCH --partition=learnai4rl

SBATCH --nodes=1

SBATCH --ntasks-per-node=1

SBATCH --gpus-per-node=1

SBATCH --cpus-per-task=10

SBATCH --account all

export NCCL_SOCKET_IFNAME=ens32 export NCCL_NSOCKS_PERTHREAD=4 export NCCL_SOCKET_NTHREADS=2

dset_type=mr dset_name=ego4d clip_length=2

gpu_id=0 num_workers=16

exp_id=aio_unified_epo6__f50_b10g1_s0.1_1 model_id=univtg

bsz=32 eval_bsz=4 n_epoch=200 lr=1e-4 lr_drop=80 lr_warmup=10 wd=1e-4

input_dropout=0.5 dropout=0 droppath=0.1

eval_epoch=5 enc_layers=4 eval_mode=add round_multiple=-1 hidden_dim=1024

b_loss_coef=10 g_loss_coef=0 eos_coef=0.1 f_loss_coef=10 s_loss_intra_coef=0 s_loss_inter_coef=0

main_metric=MR-full-mAP-key nms_thd=0.7 max_before_nms=1000

ctx_mode=video_tef v_feat_types=slowfast_clip t_feat_type=clip use_cache=1 easy_negative_only=1

resume=None

kill pid in gpu_id

ps -up nvidia-smi -i ${gpu_id} -q -x | grep pid | sed -e 's/<pid>//g' -e 's/<\/pid>//g' -e 's/^[[:space:]]*//' | awk '{print "kill -9 " $2;}' | sh

######## data paths train_path=data/${dset_name}/metadata/nlq_train.jsonl eval_path=data/${dset_name}/metadata/nlq_val.jsonl eval_split_name=val feat_root=data/${dset_name}

video features

v_feat_dim=0 v_feat_dirs=() if [[ ${v_feat_types} == "slowfast" ]]; then v_feat_dirs+=(${feat_root}/vid_slowfast) (( v_feat_dim += 2304 )) # double brackets for arithmetic op, no need to use ${v_feat_dim} fi if [[ ${v_feat_types} == "i3d" ]]; then v_feat_dirs+=(${feat_root}/vid_i3d) (( v_feat_dim += 1024 )) # double brackets for arithmetic op, no need to use ${v_feat_dim} fi if [[ ${v_feat_types} == "c3d" ]]; then v_feat_dirs+=(${feat_root}/vid_c3d) (( v_feat_dim += 500 )) # double brackets for arithmetic op, no need to use ${v_feat_dim} fi if [[ ${v_feat_types} == "clip" ]]; then v_feat_dirs+=(${feat_root}/vid_clip) (( v_feat_dim += 512 )) fi

text features

if [[ ${t_feat_type} == "clip" ]]; then t_feat_dir=${feat_root}/txt_clip t_feat_dim=512 else echo "Wrong arg for t_feat_type." exit 1 fi

python train_mr.py \ --dset_type ${dset_type} \ --dset_name ${dset_name} \ --clip_length ${clip_length} \ --exp_id ${exp_id} \ --gpu_id ${gpu_id} \ --model_id ${model_id} \ --v_feat_types ${v_feat_types} \ --t_feat_type ${t_feat_type} \ --ctx_mode ${ctx_mode} \ --train_path ${train_path} \ --eval_path ${eval_path} \ --eval_split_name ${eval_split_name} \ --eval_epoch ${eval_epoch} \ --v_feat_dirs ${v_feat_dirs[@]} \ --v_feat_dim ${v_feat_dim} \ --t_feat_dir ${t_feat_dir} \ --t_feat_dim ${t_feat_dim} \ --input_dropout ${input_dropout} \ --dropout ${dropout} \ --droppath ${droppath} \ --bsz ${bsz} \ --eval_bsz ${eval_bsz} \ --n_epoch ${n_epoch} \ --num_workers ${num_workers} \ --lr ${lr} \ --lr_drop ${lr_drop} \ --lr_warmup ${lr_warmup} \ --wd ${wd} \ --use_cache ${use_cache} \ --enc_layers ${enc_layers} \ --main_metric ${main_metric} \ --nms_thd ${nms_thd} \ --easy_negative_only ${easy_negative_only} \ --max_before_nms ${max_before_nms} \ --b_loss_coef ${b_loss_coef} \ --g_loss_coef ${g_loss_coef} \ --eos_coef ${eos_coef} \ --f_loss_coef ${f_loss_coef} \ --s_loss_intra_coef ${s_loss_intra_coef} \ --s_loss_inter_coef ${s_loss_inter_coef} \ --eval_mode ${eval_mode} \ --round_multiple ${round_multiple} \ --hidden_dim ${hidden_dim} \ --eval_init ${@:1} \

--resume ${resume} \

''

QinghongLin commented 10 months ago

@yhl2018 Hi! as you posted, the parameter i.e.,

b_loss_coef=10
g_loss_coef=0
eos_coef=0.1
f_loss_coef=10
s_loss_intra_coef=0
s_loss_inter_coef=0

where the saliency head do not work. can you adjust the parameters as f50_b10g1_s0.1_1, and take a try? then let me know the result.

b_loss_coef=10
g_loss_coef=1
eos_coef=0.1
f_loss_coef=50
s_loss_intra_coef=0.1
s_loss_inter_coef=1
yhl2018 commented 10 months ago

Okay, maybe I missed something. Thank you

yhl2018 commented 10 months ago

Perfect repetition. Thanks again for the guidance . 55 epoch , 2023-11-13 12:19:24.472:INFO:main - metrics_no_nms OrderedDict([ ('MR-full-R1@0.3-key', 7.23), ('MR-full-R1@0.5-key', 3.87), ('MR-full-R1@0.7-key', 1.55), ('MR-full-R5@0.3-key', 13.06), ('MR-full-R5@0.5-key', 7.8), ('MR-full-R5@0.7-key', 3.98), ('MR-full-mAP-key', 2.58), ('MR-full-mAP@0.5-key', 5.62), ('MR-full-mAP@0.75-key', 2.07), ('MR-full-mIoU-key', 5.21), ('MR-long-mAP', 7.43), ('MR-long-mIoU', 13.89), ('MR-middle-mAP', 5.16), ('MR-middle-mIoU', 9.63), ('MR-short-mAP', 1.97), ('MR-short-mIoU', 4.22)]) 2023-11-13 12:19:24.473:INFO:main - metrics_nms OrderedDict([ ('MR-full-R1@0.3-key', 7.23), ('MR-full-R1@0.5-key', 3.87), ('MR-full-R1@0.7-key', 1.55), ('MR-full-R5@0.3-key', 16.34), ('MR-full-R5@0.5-key', 9.94), ('MR-full-R5@0.7-key', 4.72), ('MR-full-mAP-key', 2.86), ('MR-full-mAP@0.5-key', 6.39), ('MR-full-mAP@0.75-key', 2.31), ('MR-full-mIoU-key', 5.21), ('MR-long-mAP', 9.49), ('MR-long-mIoU', 13.89), ('MR-middle-mAP', 5.86), ('MR-middle-mIoU', 9.63), ('MR-short-mAP', 2.05), ('MR-short-mIoU', 4.22)]) 2023-11-13 12:19:25.099:INFO:main - The checkpoint file has been updated.