Inference for test_public

Hi,

When I run "bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME val" everything works as expected.

However when I run "bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME test_public" I get the following error:

2020-04-09 17:27:16.745:INFO:__main__ - CUDA enabled. 2020-04-09 17:27:16.756:INFO:__main__ - Starting inference... 2020-04-09 17:27:16.757:INFO:__main__ - Computing scores Computing query2video scores: 100%|█████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.23it/s] 2020-04-09 17:27:22.153:INFO:__main__ - Inference with full-script. Traceback (most recent call last): File "baselines/crossmodal_moment_localization/inference.py", line 584, in <module> start_inference() File "baselines/crossmodal_moment_localization/inference.py", line 578, in start_inference tasks=opt.tasks, max_after_nms=100) File "baselines/crossmodal_moment_localization/inference.py", line 486, in eval_epoch eval_submission_raw = get_eval_res(model, eval_dataset, opt, tasks, max_after_nms=max_after_nms) File "baselines/crossmodal_moment_localization/inference.py", line 456, in get_eval_res tasks=tasks) File "baselines/crossmodal_moment_localization/inference.py", line 277, in compute_query2ctx_info eval_dataset.load_gt_vid_name_for_query(is_svmr) File "/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/start_end_dataset.py", line 241, in load_gt_vid_name_for_query assert "vid_name" in self.query_data[0] AssertionError

I notice that the "data/tvr_val_release.jsonl" a different format has than " data/tvr_test_public_release.jsonl" So I suspect this is the culprit and needs to be handled differently in the inference code.

P.S. kudos for all the code and clear documentation provided in this repository.

Thanks! The test-public set is not released to the public. You need to use our Codalab evaluation portal to evaluate test-public results. Please see the instructions here: https://github.com/jayleicn/TVRetrieval/tree/master/standalone_eval#codalab-submission

Ah okay, I suggest to make a small change to the documentation to avoid confusion. I refer to step two where it states "SPLIT_NAME could be val or test_public.".

I will close this issue as it's a confusion on my end instead of an issue.

I'm sorry but I still confused by testing my model's performance on test-public set.

The following is not possible because the test-public set is not released as you mentioned: "bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME test_public"

However if I look at the instructions: https://github.com/jayleicn/TVRetrieval/tree/master/standalone_eval#codalab-submission

It mentions to run bash standalone_eval/eval_sample.sh. This requires however a test_predictions_metrics.json as far as I can tell which is not present because inference cannot be done because of the missing test_public set.

I feel like I am missing something. I have no problem running inference and evaluation for validation but I cannot seem to do the same for test-public and thus not be able to submit my predictions.

Please help me out what I am missing here.

Hi @Stuffooh,

Sorry for the late reply, I was overwhelmed with deadlines.

The test-public data is released as in https://github.com/jayleicn/TVRetrieval/tree/master/data, but the answer is reserved. So you can run inference on test-public and get some predictions locally but will need to submit the predictions to our codalab server to evaluate performance.

bash standalone_eval/eval_sample.sh evaluate val set only, it is here to showcase the evaluation protocol.

Hope this helps.

Best, Jie

Hi @jayleicn,

Thank you for your time.

If I understand correctly in order to get the performance on the public test set I need to do the following:

run bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME val (This gives me tvr_val_submission.json)
run bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME test_public (This gives me tvr_test_public_submission.json)
Upload results to codelab evaluation server after which I can see the performance on the public-test set.

Currently I am having a problem with step 2 as the inference script does not work when passing the test_public argument instead of the val argument. I checked the code and in the inference.py code there is a comment which mentions that the eval_split_name should only be val set which is inline with what I am experiencing. To me it seems the inference script currently does not work for public test test which is required to get results on the public test set.

The particular error code I am receiving is in the first comment of this issue.

Could you please confirm the 3 steps I listed are the correct procedure to see the results of my model on the public-test set and could you please confirm the inference script is working as intended?

I apologize if I am missing something in which case I would love to hear what I'm missing in order to get the results on the public-test set.

Kind regards,

Kevin

Oh sorry, that's a somewhat wrong/outdated comment, the inference.py script should support test-public as well. And yes, your 3rd step is correct!

Jie

@jayleicn

I just tested in a new environment and when I run the following command:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ val

I get the following output:

bash baselines/crossmodal_moment_local
ization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_su
b-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ val
tasks VCMR SVMR VR
2020-06-05 13:46:27.864:INFO:__main__ - Setup config, data and model...
------------ Options -------------
{'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_at
t_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/tra
nsformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop':
'0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path
': 'data/tvr_val_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'val', 'eval_tasks_at_training': "['VC
MR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_infere
nce_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_s
ize': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_pro
portion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200'
, 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16'
, 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/c
rossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n
_epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_
stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'Fals
e', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ran
king_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-b
ase-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type'
: 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/trans
former_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['V
CMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'F
alse', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5
', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0
.01', 'word2idx_path': 'None'}
-------------------
2020-06-05 13:46:31.269:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_momen
t_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt
2020-06-05 13:46:31.269:INFO:__main__ - CUDA enabled.
2020-06-05 13:46:31.282:INFO:__main__ - Starting inference...
2020-06-05 13:46:31.283:INFO:__main__ - Computing scores
Computing query2video scores: 100%|███████████████████████████████████████████████| 11/11 [01:32<00:00,  8.37s/it]
2020-06-05 13:48:03.394:INFO:__main__ - Inference with full-script.
Computing q embedding: 100%|████████████████████████████████████████████████████| 218/218 [01:37<00:00,  2.23it/s]
[SVMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:04<00:00, 2523.78it/s]
[VR] Loop over queries to generate predictions: 100%|█████████████████████| 10895/10895 [00:02<00:00, 3838.96it/s]
[VCMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:07<00:00, 1415.06it/s]
2020-06-05 13:50:30.647:INFO:__main__ - Saving/Evaluating before nms results
2020-06-05 13:51:22.897:INFO:__main__ - metrics_no_nms
OrderedDict([   (   'VCMR',

When I try to run the same with test_public:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ test_public

I get the following output:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ test_public
tasks VCMR SVMR VR
2020-06-05 13:51:45.136:INFO:__main__ - Setup config, data and model...
------------ Options -------------
{'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_att_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop': '0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path': 'data/tvr_test_public_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'test_public', 'eval_tasks_at_training': "['VCMR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_inference_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_size': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_proportion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200', 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16', 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n_epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'False', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ranking_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type': 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['VCMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'False', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0.01', 'word2idx_path': 'None'}
-------------------
2020-06-05 13:51:48.228:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt
2020-06-05 13:51:48.228:INFO:__main__ - CUDA enabled.
2020-06-05 13:51:48.241:INFO:__main__ - Starting inference...
2020-06-05 13:51:48.241:INFO:__main__ - Computing scores
Computing query2video scores: 100%|█████████████████████████████████████████████████| 6/6 [00:17<00:00,  2.86s/it]
2020-06-05 13:52:05.502:INFO:__main__ - Inference with full-script.
Traceback (most recent call last):
  File "baselines/crossmodal_moment_localization/inference.py", line 584, in <module>
    start_inference()
  File "baselines/crossmodal_moment_localization/inference.py", line 578, in start_inference
    tasks=opt.tasks, max_after_nms=100)
  File "baselines/crossmodal_moment_localization/inference.py", line 486, in eval_epoch
    eval_submission_raw = get_eval_res(model, eval_dataset, opt, tasks, max_after_nms=max_after_nms)
  File "baselines/crossmodal_moment_localization/inference.py", line 456, in get_eval_res
    tasks=tasks)
  File "baselines/crossmodal_moment_localization/inference.py", line 277, in compute_query2ctx_info
    eval_dataset.load_gt_vid_name_for_query(is_svmr)
  File "/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/start_end_dataset.py", line 241, in load_gt_vid_name_for_query
    assert "vid_name" in self.query_data[0]
AssertionError

I think the asssertion error shows up because the test data has a different format as the val data used during inference. I have made no changes to the inference scripts or data so it can be considered a clean environment.

Could you please help me take another look at inference for test_public and confirm the scripts are working? and if so, what I should change to get it working on my end as well.

At the moment I am not able to generate tvr_test_public_submission.json with the inference script which is needed before I can go to step 3 and get the performance results on the test set.

@jayleicn

I just tested in a new environment and when I run the following command:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ val

I get the following output:

bash baselines/crossmodal_moment_local
ization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_su
b-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ val
tasks VCMR SVMR VR
2020-06-05 13:46:27.864:INFO:__main__ - Setup config, data and model...
------------ Options -------------
{'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_at
t_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/tra
nsformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop':
'0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path
': 'data/tvr_val_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'val', 'eval_tasks_at_training': "['VC
MR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_infere
nce_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_s
ize': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_pro
portion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200'
, 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16'
, 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/c
rossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n
_epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_
stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'Fals
e', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ran
king_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-b
ase-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type'
: 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/trans
former_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['V
CMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'F
alse', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5
', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0
.01', 'word2idx_path': 'None'}
-------------------
2020-06-05 13:46:31.269:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_momen
t_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt
2020-06-05 13:46:31.269:INFO:__main__ - CUDA enabled.
2020-06-05 13:46:31.282:INFO:__main__ - Starting inference...
2020-06-05 13:46:31.283:INFO:__main__ - Computing scores
Computing query2video scores: 100%|███████████████████████████████████████████████| 11/11 [01:32<00:00,  8.37s/it]
2020-06-05 13:48:03.394:INFO:__main__ - Inference with full-script.
Computing q embedding: 100%|████████████████████████████████████████████████████| 218/218 [01:37<00:00,  2.23it/s]
[SVMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:04<00:00, 2523.78it/s]
[VR] Loop over queries to generate predictions: 100%|█████████████████████| 10895/10895 [00:02<00:00, 3838.96it/s]
[VCMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:07<00:00, 1415.06it/s]
2020-06-05 13:50:30.647:INFO:__main__ - Saving/Evaluating before nms results
2020-06-05 13:51:22.897:INFO:__main__ - metrics_no_nms
OrderedDict([   (   'VCMR',

When I try to run the same with test_public:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ test_public

I get the following output:

bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ test_public
tasks VCMR SVMR VR
2020-06-05 13:51:45.136:INFO:__main__ - Setup config, data and model...
------------ Options -------------
{'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_att_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop': '0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path': 'data/tvr_test_public_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'test_public', 'eval_tasks_at_training': "['VCMR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_inference_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_size': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_proportion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200', 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16', 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n_epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'False', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ranking_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type': 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['VCMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'False', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0.01', 'word2idx_path': 'None'}
-------------------
2020-06-05 13:51:48.228:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt
2020-06-05 13:51:48.228:INFO:__main__ - CUDA enabled.
2020-06-05 13:51:48.241:INFO:__main__ - Starting inference...
2020-06-05 13:51:48.241:INFO:__main__ - Computing scores
Computing query2video scores: 100%|█████████████████████████████████████████████████| 6/6 [00:17<00:00,  2.86s/it]
2020-06-05 13:52:05.502:INFO:__main__ - Inference with full-script.
Traceback (most recent call last):
  File "baselines/crossmodal_moment_localization/inference.py", line 584, in <module>
    start_inference()
  File "baselines/crossmodal_moment_localization/inference.py", line 578, in start_inference
    tasks=opt.tasks, max_after_nms=100)
  File "baselines/crossmodal_moment_localization/inference.py", line 486, in eval_epoch
    eval_submission_raw = get_eval_res(model, eval_dataset, opt, tasks, max_after_nms=max_after_nms)
  File "baselines/crossmodal_moment_localization/inference.py", line 456, in get_eval_res
    tasks=tasks)
  File "baselines/crossmodal_moment_localization/inference.py", line 277, in compute_query2ctx_info
    eval_dataset.load_gt_vid_name_for_query(is_svmr)
  File "/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/start_end_dataset.py", line 241, in load_gt_vid_name_for_query
    assert "vid_name" in self.query_data[0]
AssertionError

Could you please help me take another look at inference for test_public and confirm the scripts are working? and if so, what I should change to get it working on my end as well.

At the moment I am not able to generate tvr_test_public_submission.json with the inference script which is needed before I can go to step 3 and get the performance results on the test set.

hi, your problem has been solved ? I also have this error...

Hi @Stuffooh @jun0wanan,

This issue has been fixed in the latest commit: https://github.com/jayleicn/TVRetrieval/commit/0b8b2c35641ab8595c7c1c4e01dc2a721a705c4a.

Best, Jie

@jun0wanan 请问你这个问题解决没？我用了最新的代码还是遇到这个问题

@Stuffooh Hi, have you solved this problem? I used the latest code for inference in test_public, but I still met the problem like yours.

jayleicn / TVRetrieval

Inference for test_public #2