How do I know if the video features will be better after interacting with caption?

fazlicodes commented 8 months ago

Hi @shams2023 were you able to reproduce the results for all the datasets?

shams2023 commented 8 months ago

你好@shams2023您能够重现所有数据集的结果吗？

No, did you reproduce the msvd dataset?

dumbelldore commented 8 months ago

I trained the stage 1, but not the stage 2 yet

shams2023 commented 8 months ago

我训练了第 1 阶段，但还没有训练第 2 阶段

May I ask how to configure the file and how to set it up? Mine is a single card 3090. This is the following file:

dumbelldore commented 8 months ago

This is my setup for MSVD: DATA_PATH=./Cap4Video/MSVD-Frames python -m torch.distributed.launch --nproc_per_node=2 --master_port 2963 \ train_video.py \ --do_train --num_thread_reader=4 --epochs=5 --batch_size=128 --n_display=20 \ --data_path ./Cap4Video/MSVD-Frames/Frames \ --features_path ./Cap4Video/MSVD-Frames/Frames/MSVD_frames \ --output_dir ckpts/MSVD-resume \ --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \ --datatype msvd \ --feature_framerate 1 --coef_lr 1e-3 \ --freeze_layer_num 0 --slice_framepos 2 \ --loose_type --linear_patch 2d --sim_header seqTransf \ --strategy 2 \ --pretrained_clip_name ViT-B/32 \ --interaction wti --text_pool_type transf_avg \ --world_size 2 ;\

nproc_per_node and world_size is the number of GPUs, in your case it would be 1

shams2023 commented 8 months ago

This is my setup: DATA_PATH=./Cap4Video/MSVD-Frames python -m torch.distributed.launch --nproc_per_node=2 --master_port 2963 train_video.py --do_train --num_thread_reader=4 --epochs=5 --batch_size=128 --n_display=20 --data_path ./Cap4Video/MSVD-Frames/Frames --features_path ./Cap4Video/MSVD-Frames/Frames/MSVD_frames --output_dir ckpts/MSVD-resume --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 --datatype msvd --feature_framerate 1 --coef_lr 1e-3 --freeze_layer_num 0 --slice_framepos 2 --loose_type --linear_patch 2d --sim_header seqTransf --strategy 2 --pretrained_clip_name ViT-B/32 --interaction wti --text_pool_type transf_avg --world_size 2 ;\

nproc_per_node and world_size is the number of GPUs, in your case it would be 1

Thank you for your help! I hope to continue communicating with you! （感谢你的帮助！希望能继续和你保持交流！）

shams2023 commented 8 months ago

This is my setup for MSVD: DATA_PATH=./Cap4Video/MSVD-Frames python -m torch.distributed.launch --nproc_per_node=2 --master_port 2963 train_video.py --do_train --num_thread_reader=4 --epochs=5 --batch_size=128 --n_display=20 --data_path ./Cap4Video/MSVD-Frames/Frames --features_path ./Cap4Video/MSVD-Frames/Frames/MSVD_frames --output_dir ckpts/MSVD-resume --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 --datatype msvd --feature_framerate 1 --coef_lr 1e-3 --freeze_layer_num 0 --slice_framepos 2 --loose_type --linear_patch 2d --sim_header seqTransf --strategy 2 --pretrained_clip_name ViT-B/32 --interaction wti --text_pool_type transf_avg --world_size 2 ;\

nproc_per_node and world_size is the number of GPUs, in your case it would be 1

According to your design, the first stage train_ Can video. py run? Because I see that this configuration file lacks a lot of path information compared to msrvtt. May I ask if it is possible to share a successful running code with me? Just send me my email（ 2012664144@qq.com ）I want to debug and see the specific process operation inside Sorry to bother you again, thank you very much for your help!

(按照您的设计第一阶段train_video.py可以运行吗？因为我看这个配置文件相比较于msrvtt少了好多路径信息。请问是否可以分享我一个成功的运行代码呢？发我邮箱就好（2012664144@qq.com），我想要debug看看里面的具体流程运转很抱歉再次打扰到您，万分感谢您的帮助！)

whwu95 / Cap4Video

How do I know if the video features will be better after interacting with caption? #17