v-iashin / BMT

Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
https://v-iashin.github.io/bmt
MIT License
226 stars 57 forks source link

How to train the captioning module on ground truth proposals #47

Closed adeljalalyousif closed 1 year ago

adeljalalyousif commented 1 year ago

Hi Iashin, I need to train the captioning module on ground truth proposals. What should I do?

v-iashin commented 1 year ago

Hi adeljalalyousif

To train the captioning module on ground truth proposals, run the following:

# conda activate bmt
python main.py \
    --procedure train_cap \
    --B 32
adeljalalyousif commented 1 year ago

Thanks for your response, but I got this error "FileNotFoundError: [Errno 2] No such file or directory './best_prop_model.pt' " :

{'B': 32, 'H': 4, 'N': 2, 'anchors_num_audio': 48, 'anchors_num_video': 128, 'audio_feature_name': 'vggish', 'audio_feature_timespan': 0.96, 'audio_features_path': './data/vggish_npy/', 'avail_mp4_path': './data/available_mp4.txt', 'betas': [0.9, 0.999], 'conv_layers_audio': [512, 512], 'conv_layers_video': [512, 512], 'd_aud': 128, 'd_ff_audio': None, 'd_ff_caps': None, 'd_ff_video': None, 'd_model': 1024, 'd_model_audio': None, 'd_model_caps': 300, 'd_model_video': None, 'd_vid': 1024, 'debug': False, 'device_ids': [0], 'dout_p': 0.1, 'early_stop_after': 30, 'end_token': '', 'epoch_num': 4, 'eps': 1e-08, 'feature_timespan_in_fps': 64, 'finetune_cap_encoder': False, 'finetune_prop_encoder': False, 'fps_at_extraction': 25, 'grad_clip': None, 'inf_B_coeff': 2, 'kernel_sizes_audio': [5, 13, 23, 35, 51, 69, 91, 121, 161, 211], 'kernel_sizes_video': [1, 5, 9, 13, 19, 25, 35, 45, 61, 79], 'layer_norm': False, 'log_dir': './log/', 'lr': 5e-05, 'lr_patience': None, 'lr_reduce_factor': None, 'max_len': 30, 'max_prop_per_vid': 100, 'min_freq_caps': 1, 'modality': 'audio_video', 'model': 'av_transformer', 'momentum': 0.0, 'nms_tiou_thresh': None, 'noobj_coeff': 100, 'obj_coeff': 1, 'one_by_one_starts_at': 1, 'optimizer': 'adam', 'pad_audio_feats_up_to': 800, 'pad_token': '', 'pad_video_feats_up_to': 300, 'pretrained_cap_model_path': './log/best_cap_model.pt', 'pretrained_prop_model_path': None, 'procedure': 'train_cap', 'prop_pred_path': './log/prop_results_val_1_e0_maxprop100.json', 'reference_paths': ['./data/val_1_no_missings.json', './data/val_2_no_missings.json'], 'scheduler': 'constant', 'smoothing': 0.7, 'start_token': '', 'tIoUs': [0.3, 0.5, 0.7, 0.9], 'to_log': True, 'train_json_path': './data/train.json', 'train_meta_path': './data/train.csv', 'unfreeze_word_emb': False, 'use_linear_embedder': False, 'val_1_meta_path': './data/val_1.csv', 'val_2_meta_path': './data/val_2.csv', 'val_prop_meta_path': None, 'video_feature_name': 'i3d', 'video_features_path': './data/i3d_25fps_stack64step64_2stream_npy/', 'weight_decay': 0, 'word_emb_caps': 'glove.840B.300d'} Contructing caption_iterator for "train" phase Contructing caption_iterator for "val_1" phase Contructing caption_iterator for "val_2" phase Using vanilla Generator initialization: xavier Glove emb of the same size as d_model_caps Pretrained prop path: ./best_prop_model.pt Traceback (most recent call last): File "main.py", line 200, in main(cfg) File "main.py", line 11, in main train_cap(cfg) File "/media/adel/Data3/BMT_original/scripts/train_captioning_module.py", line 40, in train_cap model = BiModalTransformer(cfg, train_dataset) File "/media/adel/Data3/BMT_original/model/captioning_module.py", line 151, in init cap_model_cpt = torch.load(cfg.pretrained_prop_model_path, map_location='cpu') File "/home/adel/miniconda3/envs/tr_17/lib/python3.8/site-packages/torch/serialization.py", line 581, in load with _open_file_like(f, 'rb') as opened_file: File "/home/adel/miniconda3/envs/tr_17/lib/python3.8/site-packages/torch/serialization.py", line 230, in _open_file_like return _open_file(name_or_buffer, mode) File "/home/adel/miniconda3/envs/tr_17/lib/python3.8/site-packages/torch/serialization.py", line 211, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: './best_prop_model.pt'

###########################################################################

I need to train the captioning module on ground truth proposals without using learned proposals

adeljalalyousif commented 1 year ago

after downloading 'best_prop_model.pt' the training is work but on cpu, how to making training run on gpu, I have (RTX-3060, 6G) I think my gpu RAM is insufficient .
So how to train the captioning module based on ground truth proposals without using learned proposals