Closed yhl2018 closed 10 months ago
Please click the google drive, and then you will see the config with their checkpoints, where the config opt.json
records all parameters.
btw, all parameters are also reported in our paper supplementary.
Thank you for your response, but I found that in the process of looking at downstream tasks, such as the ego4d(NLQ) dataset,opt.json metadata download data do not match
Hi The first figure is for the NLQ downstream fine-tuning, thus require the nlq_train/val.jsonl
; while the second figure point_egoclip_wo_val.jsonl
is for large-scale point-wise pretraining. you should download the nlq_train/val.jsonl
from here:
I see, thanks for your reply
NLQ without pretraining,The reference configuration file is: best result:5.68,3.2,1.16 ,Compared with the results in the paper, the results are still very different。without pretraining and with pretraining are the parameters different?
Yes, the parameter settings are different under two settings. For best result, pls first try different f, and then fix f try different s1, and then fix try different s2.
I will update the parameters without pt later.
Thank you for your prompt reply and look forward to seeing the best mode parameters
@yhl2018 Hi can you attach the complete bash you running for this exp? In my part, my parameter setting (w/o PT) f50_b10g1_s0.1_1 recevie 7.28 R1@0.3 in NLQ.
NLQ without pretraining: ''’#!/bin/bash
export NCCL_SOCKET_IFNAME=ens32 export NCCL_NSOCKS_PERTHREAD=4 export NCCL_SOCKET_NTHREADS=2
dset_type=mr dset_name=ego4d clip_length=2
gpu_id=0 num_workers=16
exp_id=aio_unified_epo6__f50_b10g1_s0.1_1 model_id=univtg
bsz=32 eval_bsz=4 n_epoch=200 lr=1e-4 lr_drop=80 lr_warmup=10 wd=1e-4
input_dropout=0.5 dropout=0 droppath=0.1
eval_epoch=5 enc_layers=4 eval_mode=add round_multiple=-1 hidden_dim=1024
b_loss_coef=10 g_loss_coef=0 eos_coef=0.1 f_loss_coef=10 s_loss_intra_coef=0 s_loss_inter_coef=0
main_metric=MR-full-mAP-key nms_thd=0.7 max_before_nms=1000
ctx_mode=video_tef v_feat_types=slowfast_clip t_feat_type=clip use_cache=1 easy_negative_only=1
nvidia-smi -i ${gpu_id} -q -x | grep pid | sed -e 's/<pid>//g' -e 's/<\/pid>//g' -e 's/^[[:space:]]*//'
| awk '{print "kill -9 " $2;}' | sh######## data paths train_path=data/${dset_name}/metadata/nlq_train.jsonl eval_path=data/${dset_name}/metadata/nlq_val.jsonl eval_split_name=val feat_root=data/${dset_name}
v_feat_dim=0 v_feat_dirs=() if [[ ${v_feat_types} == "slowfast" ]]; then v_feat_dirs+=(${feat_root}/vid_slowfast) (( v_feat_dim += 2304 )) # double brackets for arithmetic op, no need to use ${v_feat_dim} fi if [[ ${v_feat_types} == "i3d" ]]; then v_feat_dirs+=(${feat_root}/vid_i3d) (( v_feat_dim += 1024 )) # double brackets for arithmetic op, no need to use ${v_feat_dim} fi if [[ ${v_feat_types} == "c3d" ]]; then v_feat_dirs+=(${feat_root}/vid_c3d) (( v_feat_dim += 500 )) # double brackets for arithmetic op, no need to use ${v_feat_dim} fi if [[ ${v_feat_types} == "clip" ]]; then v_feat_dirs+=(${feat_root}/vid_clip) (( v_feat_dim += 512 )) fi
if [[ ${t_feat_type} == "clip" ]]; then t_feat_dir=${feat_root}/txt_clip t_feat_dim=512 else echo "Wrong arg for t_feat_type." exit 1 fi
python train_mr.py \ --dset_type ${dset_type} \ --dset_name ${dset_name} \ --clip_length ${clip_length} \ --exp_id ${exp_id} \ --gpu_id ${gpu_id} \ --model_id ${model_id} \ --v_feat_types ${v_feat_types} \ --t_feat_type ${t_feat_type} \ --ctx_mode ${ctx_mode} \ --train_path ${train_path} \ --eval_path ${eval_path} \ --eval_split_name ${eval_split_name} \ --eval_epoch ${eval_epoch} \ --v_feat_dirs ${v_feat_dirs[@]} \ --v_feat_dim ${v_feat_dim} \ --t_feat_dir ${t_feat_dir} \ --t_feat_dim ${t_feat_dim} \ --input_dropout ${input_dropout} \ --dropout ${dropout} \ --droppath ${droppath} \ --bsz ${bsz} \ --eval_bsz ${eval_bsz} \ --n_epoch ${n_epoch} \ --num_workers ${num_workers} \ --lr ${lr} \ --lr_drop ${lr_drop} \ --lr_warmup ${lr_warmup} \ --wd ${wd} \ --use_cache ${use_cache} \ --enc_layers ${enc_layers} \ --main_metric ${main_metric} \ --nms_thd ${nms_thd} \ --easy_negative_only ${easy_negative_only} \ --max_before_nms ${max_before_nms} \ --b_loss_coef ${b_loss_coef} \ --g_loss_coef ${g_loss_coef} \ --eos_coef ${eos_coef} \ --f_loss_coef ${f_loss_coef} \ --s_loss_intra_coef ${s_loss_intra_coef} \ --s_loss_inter_coef ${s_loss_inter_coef} \ --eval_mode ${eval_mode} \ --round_multiple ${round_multiple} \ --hidden_dim ${hidden_dim} \ --eval_init ${@:1} \
''
@yhl2018 Hi! as you posted, the parameter i.e.,
b_loss_coef=10
g_loss_coef=0
eos_coef=0.1
f_loss_coef=10
s_loss_intra_coef=0
s_loss_inter_coef=0
where the saliency head do not work. can you adjust the parameters as f50_b10g1_s0.1_1, and take a try? then let me know the result.
b_loss_coef=10
g_loss_coef=1
eos_coef=0.1
f_loss_coef=50
s_loss_intra_coef=0.1
s_loss_inter_coef=1
Okay, maybe I missed something. Thank you
Perfect repetition. Thanks again for the guidance . 55 epoch , 2023-11-13 12:19:24.472:INFO:main - metrics_no_nms OrderedDict([ ('MR-full-R1@0.3-key', 7.23), ('MR-full-R1@0.5-key', 3.87), ('MR-full-R1@0.7-key', 1.55), ('MR-full-R5@0.3-key', 13.06), ('MR-full-R5@0.5-key', 7.8), ('MR-full-R5@0.7-key', 3.98), ('MR-full-mAP-key', 2.58), ('MR-full-mAP@0.5-key', 5.62), ('MR-full-mAP@0.75-key', 2.07), ('MR-full-mIoU-key', 5.21), ('MR-long-mAP', 7.43), ('MR-long-mIoU', 13.89), ('MR-middle-mAP', 5.16), ('MR-middle-mIoU', 9.63), ('MR-short-mAP', 1.97), ('MR-short-mIoU', 4.22)]) 2023-11-13 12:19:24.473:INFO:main - metrics_nms OrderedDict([ ('MR-full-R1@0.3-key', 7.23), ('MR-full-R1@0.5-key', 3.87), ('MR-full-R1@0.7-key', 1.55), ('MR-full-R5@0.3-key', 16.34), ('MR-full-R5@0.5-key', 9.94), ('MR-full-R5@0.7-key', 4.72), ('MR-full-mAP-key', 2.86), ('MR-full-mAP@0.5-key', 6.39), ('MR-full-mAP@0.75-key', 2.31), ('MR-full-mIoU-key', 5.21), ('MR-long-mAP', 9.49), ('MR-long-mIoU', 13.89), ('MR-middle-mAP', 5.86), ('MR-middle-mIoU', 9.63), ('MR-short-mAP', 2.05), ('MR-short-mIoU', 4.22)]) 2023-11-13 12:19:25.099:INFO:main - The checkpoint file has been updated.
Thanks for your work, some training details are not too clearly described in the readme, so I would like to ask, what are the downstream tasks, training parameters and corresponding training methods?