whwu95 / Cap4Video

【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
https://arxiv.org/abs/2301.00184
MIT License
225 stars 16 forks source link

How do I know if the video features will be better after interacting with caption? #17

Closed shams2023 closed 9 months ago

fazlicodes commented 8 months ago

Hi @shams2023 were you able to reproduce the results for all the datasets?

shams2023 commented 8 months ago

你好@shams2023您能够重现所有数据集的结果吗?

No, did you reproduce the msvd dataset?

dumbelldore commented 8 months ago

I trained the stage 1, but not the stage 2 yet

shams2023 commented 8 months ago

我训练了第 1 阶段,但还没有训练第 2 阶段

May I ask how to configure the file and how to set it up? Mine is a single card 3090. This is the following file: image

dumbelldore commented 8 months ago

This is my setup for MSVD: DATA_PATH=./Cap4Video/MSVD-Frames python -m torch.distributed.launch --nproc_per_node=2 --master_port 2963 \ train_video.py \ --do_train --num_thread_reader=4 --epochs=5 --batch_size=128 --n_display=20 \ --data_path ./Cap4Video/MSVD-Frames/Frames \ --features_path ./Cap4Video/MSVD-Frames/Frames/MSVD_frames \ --output_dir ckpts/MSVD-resume \ --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \ --datatype msvd \ --feature_framerate 1 --coef_lr 1e-3 \ --freeze_layer_num 0 --slice_framepos 2 \ --loose_type --linear_patch 2d --sim_header seqTransf \ --strategy 2 \ --pretrained_clip_name ViT-B/32 \ --interaction wti --text_pool_type transf_avg \ --world_size 2 ;\

nproc_per_node and world_size is the number of GPUs, in your case it would be 1

shams2023 commented 8 months ago

This is my setup: DATA_PATH=./Cap4Video/MSVD-Frames python -m torch.distributed.launch --nproc_per_node=2 --master_port 2963 train_video.py --do_train --num_thread_reader=4 --epochs=5 --batch_size=128 --n_display=20 --data_path ./Cap4Video/MSVD-Frames/Frames --features_path ./Cap4Video/MSVD-Frames/Frames/MSVD_frames --output_dir ckpts/MSVD-resume --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 --datatype msvd --feature_framerate 1 --coef_lr 1e-3 --freeze_layer_num 0 --slice_framepos 2 --loose_type --linear_patch 2d --sim_header seqTransf --strategy 2 --pretrained_clip_name ViT-B/32 --interaction wti --text_pool_type transf_avg --world_size 2 ;\

nproc_per_node and world_size is the number of GPUs, in your case it would be 1

Thank you for your help! I hope to continue communicating with you! (感谢你的帮助!希望能继续和你保持交流!)

shams2023 commented 8 months ago

This is my setup for MSVD: DATA_PATH=./Cap4Video/MSVD-Frames python -m torch.distributed.launch --nproc_per_node=2 --master_port 2963 train_video.py --do_train --num_thread_reader=4 --epochs=5 --batch_size=128 --n_display=20 --data_path ./Cap4Video/MSVD-Frames/Frames --features_path ./Cap4Video/MSVD-Frames/Frames/MSVD_frames --output_dir ckpts/MSVD-resume --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 --datatype msvd --feature_framerate 1 --coef_lr 1e-3 --freeze_layer_num 0 --slice_framepos 2 --loose_type --linear_patch 2d --sim_header seqTransf --strategy 2 --pretrained_clip_name ViT-B/32 --interaction wti --text_pool_type transf_avg --world_size 2 ;\

nproc_per_node and world_size is the number of GPUs, in your case it would be 1

According to your design, the first stage train_ Can video. py run? Because I see that this configuration file lacks a lot of path information compared to msrvtt. May I ask if it is possible to share a successful running code with me? Just send me my email( 2012664144@qq.com )I want to debug and see the specific process operation inside Sorry to bother you again, thank you very much for your help!

(按照您的设计第一阶段train_video.py可以运行吗?因为我看这个配置文件相比较于msrvtt少了好多路径信息。 请问是否可以分享我一个成功的运行代码呢?发我邮箱就好(2012664144@qq.com),我想要debug看看里面的具体流程运转 很抱歉再次打扰到您,万分感谢您的帮助!)