jayleicn / TVRetrieval

[ECCV 2020] PyTorch code for XML on TVRetrieval dataset - TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
https://tvr.cs.unc.edu
MIT License
151 stars 24 forks source link

Inference for test_public #2

Closed Stuffooh closed 3 years ago

Stuffooh commented 4 years ago

Hi,

When I run "bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME val" everything works as expected.

However when I run "bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME test_public" I get the following error:

2020-04-09 17:27:16.745:INFO:__main__ - CUDA enabled. 2020-04-09 17:27:16.756:INFO:__main__ - Starting inference... 2020-04-09 17:27:16.757:INFO:__main__ - Computing scores Computing query2video scores: 100%|█████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.23it/s] 2020-04-09 17:27:22.153:INFO:__main__ - Inference with full-script. Traceback (most recent call last): File "baselines/crossmodal_moment_localization/inference.py", line 584, in <module> start_inference() File "baselines/crossmodal_moment_localization/inference.py", line 578, in start_inference tasks=opt.tasks, max_after_nms=100) File "baselines/crossmodal_moment_localization/inference.py", line 486, in eval_epoch eval_submission_raw = get_eval_res(model, eval_dataset, opt, tasks, max_after_nms=max_after_nms) File "baselines/crossmodal_moment_localization/inference.py", line 456, in get_eval_res tasks=tasks) File "baselines/crossmodal_moment_localization/inference.py", line 277, in compute_query2ctx_info eval_dataset.load_gt_vid_name_for_query(is_svmr) File "/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/start_end_dataset.py", line 241, in load_gt_vid_name_for_query assert "vid_name" in self.query_data[0] AssertionError

I notice that the "data/tvr_val_release.jsonl" a different format has than " data/tvr_test_public_release.jsonl" So I suspect this is the culprit and needs to be handled differently in the inference code.

P.S. kudos for all the code and clear documentation provided in this repository.

jayleicn commented 4 years ago

Thanks! The test-public set is not released to the public. You need to use our Codalab evaluation portal to evaluate test-public results. Please see the instructions here: https://github.com/jayleicn/TVRetrieval/tree/master/standalone_eval#codalab-submission

Stuffooh commented 4 years ago

Ah okay, I suggest to make a small change to the documentation to avoid confusion. I refer to step two where it states "SPLIT_NAME could be val or test_public.".

I will close this issue as it's a confusion on my end instead of an issue.

Stuffooh commented 4 years ago

I'm sorry but I still confused by testing my model's performance on test-public set.

The following is not possible because the test-public set is not released as you mentioned: "bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME test_public"

However if I look at the instructions: https://github.com/jayleicn/TVRetrieval/tree/master/standalone_eval#codalab-submission

It mentions to run bash standalone_eval/eval_sample.sh. This requires however a test_predictions_metrics.json as far as I can tell which is not present because inference cannot be done because of the missing test_public set.

I feel like I am missing something. I have no problem running inference and evaluation for validation but I cannot seem to do the same for test-public and thus not be able to submit my predictions.

Please help me out what I am missing here.

jayleicn commented 4 years ago

Hi @Stuffooh,

Sorry for the late reply, I was overwhelmed with deadlines.

The test-public data is released as in https://github.com/jayleicn/TVRetrieval/tree/master/data, but the answer is reserved. So you can run inference on test-public and get some predictions locally but will need to submit the predictions to our codalab server to evaluate performance.

bash standalone_eval/eval_sample.sh evaluate val set only, it is here to showcase the evaluation protocol.

Hope this helps.

Best, Jie

Stuffooh commented 4 years ago

Hi @jayleicn,

Thank you for your time.

If I understand correctly in order to get the performance on the public test set I need to do the following:

  1. run bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME val (This gives me tvr_val_submission.json)
  2. run bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME test_public (This gives me tvr_test_public_submission.json)
  3. Upload results to codelab evaluation server after which I can see the performance on the public-test set.

Currently I am having a problem with step 2 as the inference script does not work when passing the test_public argument instead of the val argument. I checked the code and in the inference.py code there is a comment which mentions that the eval_split_name should only be val set which is inline with what I am experiencing. To me it seems the inference script currently does not work for public test test which is required to get results on the public test set.

The particular error code I am receiving is in the first comment of this issue.

Could you please confirm the 3 steps I listed are the correct procedure to see the results of my model on the public-test set and could you please confirm the inference script is working as intended?

I apologize if I am missing something in which case I would love to hear what I'm missing in order to get the results on the public-test set.

Kind regards,

Kevin

jayleicn commented 4 years ago

Oh sorry, that's a somewhat wrong/outdated comment, the inference.py script should support test-public as well. And yes, your 3rd step is correct!

Jie

Stuffooh commented 4 years ago

@jayleicn

I just tested in a new environment and when I run the following command:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ val

I get the following output:

bash baselines/crossmodal_moment_local
ization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_su
b-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ val
tasks VCMR SVMR VR
2020-06-05 13:46:27.864:INFO:__main__ - Setup config, data and model...
------------ Options -------------
{'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_at
t_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/tra
nsformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop':
'0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path
': 'data/tvr_val_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'val', 'eval_tasks_at_training': "['VC
MR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_infere
nce_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_s
ize': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_pro
portion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200'
, 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16'
, 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/c
rossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n
_epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_
stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'Fals
e', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ran
king_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-b
ase-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type'
: 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/trans
former_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['V
CMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'F
alse', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5
', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0
.01', 'word2idx_path': 'None'}
-------------------
2020-06-05 13:46:31.269:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_momen
t_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt
2020-06-05 13:46:31.269:INFO:__main__ - CUDA enabled.
2020-06-05 13:46:31.282:INFO:__main__ - Starting inference...
2020-06-05 13:46:31.283:INFO:__main__ - Computing scores
Computing query2video scores: 100%|███████████████████████████████████████████████| 11/11 [01:32<00:00,  8.37s/it]
2020-06-05 13:48:03.394:INFO:__main__ - Inference with full-script.
Computing q embedding: 100%|████████████████████████████████████████████████████| 218/218 [01:37<00:00,  2.23it/s]
[SVMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:04<00:00, 2523.78it/s]
[VR] Loop over queries to generate predictions: 100%|█████████████████████| 10895/10895 [00:02<00:00, 3838.96it/s]
[VCMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:07<00:00, 1415.06it/s]
2020-06-05 13:50:30.647:INFO:__main__ - Saving/Evaluating before nms results
2020-06-05 13:51:22.897:INFO:__main__ - metrics_no_nms
OrderedDict([   (   'VCMR',

When I try to run the same with test_public:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ test_public

I get the following output:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ test_public
tasks VCMR SVMR VR
2020-06-05 13:51:45.136:INFO:__main__ - Setup config, data and model...
------------ Options -------------
{'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_att_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop': '0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path': 'data/tvr_test_public_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'test_public', 'eval_tasks_at_training': "['VCMR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_inference_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_size': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_proportion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200', 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16', 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n_epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'False', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ranking_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type': 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['VCMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'False', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0.01', 'word2idx_path': 'None'}
-------------------
2020-06-05 13:51:48.228:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt
2020-06-05 13:51:48.228:INFO:__main__ - CUDA enabled.
2020-06-05 13:51:48.241:INFO:__main__ - Starting inference...
2020-06-05 13:51:48.241:INFO:__main__ - Computing scores
Computing query2video scores: 100%|█████████████████████████████████████████████████| 6/6 [00:17<00:00,  2.86s/it]
2020-06-05 13:52:05.502:INFO:__main__ - Inference with full-script.
Traceback (most recent call last):
  File "baselines/crossmodal_moment_localization/inference.py", line 584, in <module>
    start_inference()
  File "baselines/crossmodal_moment_localization/inference.py", line 578, in start_inference
    tasks=opt.tasks, max_after_nms=100)
  File "baselines/crossmodal_moment_localization/inference.py", line 486, in eval_epoch
    eval_submission_raw = get_eval_res(model, eval_dataset, opt, tasks, max_after_nms=max_after_nms)
  File "baselines/crossmodal_moment_localization/inference.py", line 456, in get_eval_res
    tasks=tasks)
  File "baselines/crossmodal_moment_localization/inference.py", line 277, in compute_query2ctx_info
    eval_dataset.load_gt_vid_name_for_query(is_svmr)
  File "/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/start_end_dataset.py", line 241, in load_gt_vid_name_for_query
    assert "vid_name" in self.query_data[0]
AssertionError

I think the asssertion error shows up because the test data has a different format as the val data used during inference. I have made no changes to the inference scripts or data so it can be considered a clean environment.

Could you please help me take another look at inference for test_public and confirm the scripts are working? and if so, what I should change to get it working on my end as well.

At the moment I am not able to generate tvr_test_public_submission.json with the inference script which is needed before I can go to step 3 and get the performance results on the test set.

jun0wanan commented 3 years ago

@jayleicn

I just tested in a new environment and when I run the following command:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ val

I get the following output:

bash baselines/crossmodal_moment_local
ization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_su
b-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ val
tasks VCMR SVMR VR
2020-06-05 13:46:27.864:INFO:__main__ - Setup config, data and model...
------------ Options -------------
{'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_at
t_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/tra
nsformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop':
'0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path
': 'data/tvr_val_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'val', 'eval_tasks_at_training': "['VC
MR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_infere
nce_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_s
ize': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_pro
portion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200'
, 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16'
, 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/c
rossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n
_epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_
stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'Fals
e', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ran
king_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-b
ase-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type'
: 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/trans
former_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['V
CMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'F
alse', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5
', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0
.01', 'word2idx_path': 'None'}
-------------------
2020-06-05 13:46:31.269:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_momen
t_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt
2020-06-05 13:46:31.269:INFO:__main__ - CUDA enabled.
2020-06-05 13:46:31.282:INFO:__main__ - Starting inference...
2020-06-05 13:46:31.283:INFO:__main__ - Computing scores
Computing query2video scores: 100%|███████████████████████████████████████████████| 11/11 [01:32<00:00,  8.37s/it]
2020-06-05 13:48:03.394:INFO:__main__ - Inference with full-script.
Computing q embedding: 100%|████████████████████████████████████████████████████| 218/218 [01:37<00:00,  2.23it/s]
[SVMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:04<00:00, 2523.78it/s]
[VR] Loop over queries to generate predictions: 100%|█████████████████████| 10895/10895 [00:02<00:00, 3838.96it/s]
[VCMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:07<00:00, 1415.06it/s]
2020-06-05 13:50:30.647:INFO:__main__ - Saving/Evaluating before nms results
2020-06-05 13:51:22.897:INFO:__main__ - metrics_no_nms
OrderedDict([   (   'VCMR',

When I try to run the same with test_public:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ test_public

I get the following output:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ test_public
tasks VCMR SVMR VR
2020-06-05 13:51:45.136:INFO:__main__ - Setup config, data and model...
------------ Options -------------
{'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_att_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop': '0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path': 'data/tvr_test_public_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'test_public', 'eval_tasks_at_training': "['VCMR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_inference_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_size': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_proportion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200', 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16', 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n_epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'False', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ranking_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type': 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['VCMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'False', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0.01', 'word2idx_path': 'None'}
-------------------
2020-06-05 13:51:48.228:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt
2020-06-05 13:51:48.228:INFO:__main__ - CUDA enabled.
2020-06-05 13:51:48.241:INFO:__main__ - Starting inference...
2020-06-05 13:51:48.241:INFO:__main__ - Computing scores
Computing query2video scores: 100%|█████████████████████████████████████████████████| 6/6 [00:17<00:00,  2.86s/it]
2020-06-05 13:52:05.502:INFO:__main__ - Inference with full-script.
Traceback (most recent call last):
  File "baselines/crossmodal_moment_localization/inference.py", line 584, in <module>
    start_inference()
  File "baselines/crossmodal_moment_localization/inference.py", line 578, in start_inference
    tasks=opt.tasks, max_after_nms=100)
  File "baselines/crossmodal_moment_localization/inference.py", line 486, in eval_epoch
    eval_submission_raw = get_eval_res(model, eval_dataset, opt, tasks, max_after_nms=max_after_nms)
  File "baselines/crossmodal_moment_localization/inference.py", line 456, in get_eval_res
    tasks=tasks)
  File "baselines/crossmodal_moment_localization/inference.py", line 277, in compute_query2ctx_info
    eval_dataset.load_gt_vid_name_for_query(is_svmr)
  File "/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/start_end_dataset.py", line 241, in load_gt_vid_name_for_query
    assert "vid_name" in self.query_data[0]
AssertionError

I think the asssertion error shows up because the test data has a different format as the val data used during inference. I have made no changes to the inference scripts or data so it can be considered a clean environment.

Could you please help me take another look at inference for test_public and confirm the scripts are working? and if so, what I should change to get it working on my end as well.

At the moment I am not able to generate tvr_test_public_submission.json with the inference script which is needed before I can go to step 3 and get the performance results on the test set.

hi, your problem has been solved ? I also have this error...

jayleicn commented 3 years ago

Hi @Stuffooh @jun0wanan,

This issue has been fixed in the latest commit: https://github.com/jayleicn/TVRetrieval/commit/0b8b2c35641ab8595c7c1c4e01dc2a721a705c4a.

Best, Jie

tuyunbin commented 3 years ago

@jun0wanan 请问你这个问题解决没?我用了最新的代码还是遇到这个问题

tuyunbin commented 3 years ago

@Stuffooh Hi, have you solved this problem? I used the latest code for inference in test_public, but I still met the problem like yours.