Closed Stuffooh closed 3 years ago
Thanks! The test-public set is not released to the public. You need to use our Codalab evaluation portal to evaluate test-public results. Please see the instructions here: https://github.com/jayleicn/TVRetrieval/tree/master/standalone_eval#codalab-submission
Ah okay, I suggest to make a small change to the documentation to avoid confusion. I refer to step two where it states "SPLIT_NAME could be val or test_public.".
I will close this issue as it's a confusion on my end instead of an issue.
I'm sorry but I still confused by testing my model's performance on test-public set.
The following is not possible because the test-public set is not released as you mentioned:
"bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME test_public"
However if I look at the instructions: https://github.com/jayleicn/TVRetrieval/tree/master/standalone_eval#codalab-submission
It mentions to run bash standalone_eval/eval_sample.sh
. This requires however a test_predictions_metrics.json
as far as I can tell which is not present because inference cannot be done because of the missing test_public set.
I feel like I am missing something. I have no problem running inference and evaluation for validation but I cannot seem to do the same for test-public and thus not be able to submit my predictions.
Please help me out what I am missing here.
Hi @Stuffooh,
Sorry for the late reply, I was overwhelmed with deadlines.
The test-public
data is released as in https://github.com/jayleicn/TVRetrieval/tree/master/data, but the answer
is reserved. So you can run inference on test-public
and get some predictions locally but will need to submit the predictions to our codalab server to evaluate performance.
bash standalone_eval/eval_sample.sh
evaluate val
set only, it is here to showcase the evaluation protocol.
Hope this helps.
Best, Jie
Hi @jayleicn,
Thank you for your time.
If I understand correctly in order to get the performance on the public test set I need to do the following:
bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME val
(This gives me tvr_val_submission.json
)bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME test_public
(This gives me tvr_test_public_submission.json
)Currently I am having a problem with step 2 as the inference script does not work when passing the test_public
argument instead of the val
argument. I checked the code and in the inference.py
code there is a comment which mentions that the eval_split_name
should only be val set which is inline with what I am experiencing. To me it seems the inference script currently does not work for public test test which is required to get results on the public test set.
The particular error code I am receiving is in the first comment of this issue.
Could you please confirm the 3 steps I listed are the correct procedure to see the results of my model on the public-test set and could you please confirm the inference script is working as intended?
I apologize if I am missing something in which case I would love to hear what I'm missing in order to get the results on the public-test set.
Kind regards,
Kevin
Oh sorry, that's a somewhat wrong/outdated comment, the inference.py script should support test-public as well. And yes, your 3rd step is correct!
Jie
@jayleicn
I just tested in a new environment and when I run the following command:
bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ val
I get the following output:
bash baselines/crossmodal_moment_local
ization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_su
b-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ val
tasks VCMR SVMR VR
2020-06-05 13:46:27.864:INFO:__main__ - Setup config, data and model...
------------ Options -------------
{'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_at
t_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/tra
nsformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop':
'0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path
': 'data/tvr_val_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'val', 'eval_tasks_at_training': "['VC
MR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_infere
nce_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_s
ize': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_pro
portion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200'
, 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16'
, 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/c
rossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n
_epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_
stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'Fals
e', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ran
king_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-b
ase-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type'
: 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/trans
former_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['V
CMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'F
alse', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5
', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0
.01', 'word2idx_path': 'None'}
-------------------
2020-06-05 13:46:31.269:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_momen
t_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt
2020-06-05 13:46:31.269:INFO:__main__ - CUDA enabled.
2020-06-05 13:46:31.282:INFO:__main__ - Starting inference...
2020-06-05 13:46:31.283:INFO:__main__ - Computing scores
Computing query2video scores: 100%|███████████████████████████████████████████████| 11/11 [01:32<00:00, 8.37s/it]
2020-06-05 13:48:03.394:INFO:__main__ - Inference with full-script.
Computing q embedding: 100%|████████████████████████████████████████████████████| 218/218 [01:37<00:00, 2.23it/s]
[SVMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:04<00:00, 2523.78it/s]
[VR] Loop over queries to generate predictions: 100%|█████████████████████| 10895/10895 [00:02<00:00, 3838.96it/s]
[VCMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:07<00:00, 1415.06it/s]
2020-06-05 13:50:30.647:INFO:__main__ - Saving/Evaluating before nms results
2020-06-05 13:51:22.897:INFO:__main__ - metrics_no_nms
OrderedDict([ ( 'VCMR',
When I try to run the same with test_public
:
bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ test_public
I get the following output:
bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ test_public
tasks VCMR SVMR VR
2020-06-05 13:51:45.136:INFO:__main__ - Setup config, data and model...
------------ Options -------------
{'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_att_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop': '0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path': 'data/tvr_test_public_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'test_public', 'eval_tasks_at_training': "['VCMR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_inference_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_size': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_proportion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200', 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16', 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n_epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'False', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ranking_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type': 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['VCMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'False', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0.01', 'word2idx_path': 'None'}
-------------------
2020-06-05 13:51:48.228:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt
2020-06-05 13:51:48.228:INFO:__main__ - CUDA enabled.
2020-06-05 13:51:48.241:INFO:__main__ - Starting inference...
2020-06-05 13:51:48.241:INFO:__main__ - Computing scores
Computing query2video scores: 100%|█████████████████████████████████████████████████| 6/6 [00:17<00:00, 2.86s/it]
2020-06-05 13:52:05.502:INFO:__main__ - Inference with full-script.
Traceback (most recent call last):
File "baselines/crossmodal_moment_localization/inference.py", line 584, in <module>
start_inference()
File "baselines/crossmodal_moment_localization/inference.py", line 578, in start_inference
tasks=opt.tasks, max_after_nms=100)
File "baselines/crossmodal_moment_localization/inference.py", line 486, in eval_epoch
eval_submission_raw = get_eval_res(model, eval_dataset, opt, tasks, max_after_nms=max_after_nms)
File "baselines/crossmodal_moment_localization/inference.py", line 456, in get_eval_res
tasks=tasks)
File "baselines/crossmodal_moment_localization/inference.py", line 277, in compute_query2ctx_info
eval_dataset.load_gt_vid_name_for_query(is_svmr)
File "/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/start_end_dataset.py", line 241, in load_gt_vid_name_for_query
assert "vid_name" in self.query_data[0]
AssertionError
I think the asssertion error shows up because the test data has a different format as the val data used during inference. I have made no changes to the inference scripts or data so it can be considered a clean environment.
Could you please help me take another look at inference for test_public and confirm the scripts are working? and if so, what I should change to get it working on my end as well.
At the moment I am not able to generate tvr_test_public_submission.json
with the inference script which is needed before I can go to step 3 and get the performance results on the test set.
@jayleicn
I just tested in a new environment and when I run the following command:
bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ val
I get the following output:
bash baselines/crossmodal_moment_local ization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_su b-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ val tasks VCMR SVMR VR 2020-06-05 13:46:27.864:INFO:__main__ - Setup config, data and model... ------------ Options ------------- {'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_at t_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/tra nsformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop': '0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path ': 'data/tvr_val_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'val', 'eval_tasks_at_training': "['VC MR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_infere nce_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_s ize': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_pro portion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200' , 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16' , 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/c rossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n _epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_ stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'Fals e', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ran king_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-b ase-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type' : 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/trans former_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['V CMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'F alse', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5 ', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0 .01', 'word2idx_path': 'None'} ------------------- 2020-06-05 13:46:31.269:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_momen t_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt 2020-06-05 13:46:31.269:INFO:__main__ - CUDA enabled. 2020-06-05 13:46:31.282:INFO:__main__ - Starting inference... 2020-06-05 13:46:31.283:INFO:__main__ - Computing scores Computing query2video scores: 100%|███████████████████████████████████████████████| 11/11 [01:32<00:00, 8.37s/it] 2020-06-05 13:48:03.394:INFO:__main__ - Inference with full-script. Computing q embedding: 100%|████████████████████████████████████████████████████| 218/218 [01:37<00:00, 2.23it/s] [SVMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:04<00:00, 2523.78it/s] [VR] Loop over queries to generate predictions: 100%|█████████████████████| 10895/10895 [00:02<00:00, 3838.96it/s] [VCMR] Loop over queries to generate predictions: 100%|███████████████████| 10895/10895 [00:07<00:00, 1415.06it/s] 2020-06-05 13:50:30.647:INFO:__main__ - Saving/Evaluating before nms results 2020-06-05 13:51:22.897:INFO:__main__ - metrics_no_nms OrderedDict([ ( 'VCMR',
When I try to run the same with
test_public
:
bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/ test_public
I get the following output:
bash baselines/crossmodal_moment_localization/scripts/inference.sh /home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/ test_public tasks VCMR SVMR VR 2020-06-05 13:51:45.136:INFO:__main__ - Setup config, data and model... ------------ Options ------------- {'add_pe_rnn': 'False', 'bsz': '128', 'clip_length': '1.5', 'conv_kernel_size': '5', 'conv_stride': '1', 'cross_att_drop': '0.1', 'ctx_mode': 'video_sub', 'data_ratio': '1.0', 'debug': 'False', 'desc_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/query.h5', 'device': '0', 'device_ids': '[0]', 'drop': '0.1', 'dset_name': 'tvr', 'encoder_type': 'transformer', 'eval_context_bsz': '200', 'eval_id': 'None', 'eval_path': 'data/tvr_test_public_release.jsonl', 'eval_query_bsz': '50', 'eval_split_name': 'test_public', 'eval_tasks_at_training': "['VCMR', 'SVMR', 'VR']", 'eval_untrained': 'False', 'exp_id': 'alberta-base-v2_epochs-5_no-gradient', 'external_inference_vr_res_path': 'None', 'glove_path': 'None', 'grad_clip': '-1', 'hard_negtiave_start_epoch': '20', 'hard_pool_size': '20', 'hidden_size': '256', 'initializer_range': '0.02', 'input_drop': '0.1', 'lr': '0.0001', 'lr_warmup_proportion': '0.01', 'lw_neg_ctx': '1', 'lw_neg_q': '1', 'lw_st_ed': '0.01', 'margin': '0.1', 'max_before_nms': '200', 'max_ctx_l': '100', 'max_desc_l': '30', 'max_es_cnt': '10', 'max_position_embeddings': '300', 'max_pred_l': '16', 'max_sub_l': '50', 'max_vcmr_video': '100', 'min_pred_l': '2', 'model_dir': '/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/', 'n_epoch': '100', 'n_heads': '4', 'nms_thd': '-1', 'no_core_driver': 'True', 'no_cross_att': 'False', 'no_merge_two_stream': 'False', 'no_modular': 'False', 'no_norm_tfeat': 'False', 'no_norm_vfeat': 'True', 'no_pin_memory': 'False', 'no_self_att': 'False', 'num_workers': '8', 'pe_type': 'cosine', 'q2c_alpha': '20', 'q_feat_size': '768', 'ranking_loss_type': 'hinge', 'results_dir': 'baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13', 'results_root': 'results', 'seed': '2018', 'span_predictor_type': 'conv', 'stack_conv_predictor_conv_kernel_sizes': '-1', 'stop_task': 'VCMR', 'sub_bert_path': '/home/kevin/transformer_features_tvr/alberta-base-v2_epochs-5_no-gradient/sub_clip_level.h5', 'sub_feat_size': '768', 'tasks': "['VCMR', 'SVMR', 'VR']", 'train_path': 'data/tvr_train_release.jsonl', 'train_span_start_epoch': '0', 'use_glove': 'False', 'vid_feat_path': 'data/tvr_feature_release/video_feature/tvr_resnet152_rgb_max_i3d_rgb600_avg_cat_cl-1.5.h5', 'vid_feat_size': '3072', 'video_duration_idx_path': 'data/tvr_video2dur_idx.json', 'vocab_size': '-1', 'wd': '0.01', 'word2idx_path': 'None'} ------------------- 2020-06-05 13:51:48.228:INFO:__main__ - Loaded model saved at epoch 39 from checkpoint: baselines/crossmodal_moment_localization/results/tvr-video_sub-alberta-base-v2_epochs-5_no-gradient-2020_05_19_15_28_13/model.ckpt 2020-06-05 13:51:48.228:INFO:__main__ - CUDA enabled. 2020-06-05 13:51:48.241:INFO:__main__ - Starting inference... 2020-06-05 13:51:48.241:INFO:__main__ - Computing scores Computing query2video scores: 100%|█████████████████████████████████████████████████| 6/6 [00:17<00:00, 2.86s/it] 2020-06-05 13:52:05.502:INFO:__main__ - Inference with full-script. Traceback (most recent call last): File "baselines/crossmodal_moment_localization/inference.py", line 584, in <module> start_inference() File "baselines/crossmodal_moment_localization/inference.py", line 578, in start_inference tasks=opt.tasks, max_after_nms=100) File "baselines/crossmodal_moment_localization/inference.py", line 486, in eval_epoch eval_submission_raw = get_eval_res(model, eval_dataset, opt, tasks, max_after_nms=max_after_nms) File "baselines/crossmodal_moment_localization/inference.py", line 456, in get_eval_res tasks=tasks) File "baselines/crossmodal_moment_localization/inference.py", line 277, in compute_query2ctx_info eval_dataset.load_gt_vid_name_for_query(is_svmr) File "/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/start_end_dataset.py", line 241, in load_gt_vid_name_for_query assert "vid_name" in self.query_data[0] AssertionError
I think the asssertion error shows up because the test data has a different format as the val data used during inference. I have made no changes to the inference scripts or data so it can be considered a clean environment.
Could you please help me take another look at inference for test_public and confirm the scripts are working? and if so, what I should change to get it working on my end as well.
At the moment I am not able to generate
tvr_test_public_submission.json
with the inference script which is needed before I can go to step 3 and get the performance results on the test set.
hi, your problem has been solved ? I also have this error...
Hi @Stuffooh @jun0wanan,
This issue has been fixed in the latest commit: https://github.com/jayleicn/TVRetrieval/commit/0b8b2c35641ab8595c7c1c4e01dc2a721a705c4a.
Best, Jie
@jun0wanan 请问你这个问题解决没?我用了最新的代码还是遇到这个问题
@Stuffooh Hi, have you solved this problem? I used the latest code for inference in test_public, but I still met the problem like yours.
Hi,
When I run "bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME val" everything works as expected.
However when I run "bash baselines/crossmodal_moment_localization/scripts/inference.sh MODEL_DIR_NAME test_public" I get the following error:
2020-04-09 17:27:16.745:INFO:__main__ - CUDA enabled. 2020-04-09 17:27:16.756:INFO:__main__ - Starting inference... 2020-04-09 17:27:16.757:INFO:__main__ - Computing scores Computing query2video scores: 100%|█████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.23it/s] 2020-04-09 17:27:22.153:INFO:__main__ - Inference with full-script. Traceback (most recent call last): File "baselines/crossmodal_moment_localization/inference.py", line 584, in <module> start_inference() File "baselines/crossmodal_moment_localization/inference.py", line 578, in start_inference tasks=opt.tasks, max_after_nms=100) File "baselines/crossmodal_moment_localization/inference.py", line 486, in eval_epoch eval_submission_raw = get_eval_res(model, eval_dataset, opt, tasks, max_after_nms=max_after_nms) File "baselines/crossmodal_moment_localization/inference.py", line 456, in get_eval_res tasks=tasks) File "baselines/crossmodal_moment_localization/inference.py", line 277, in compute_query2ctx_info eval_dataset.load_gt_vid_name_for_query(is_svmr) File "/home/kevin/TVRetrieval/baselines/crossmodal_moment_localization/start_end_dataset.py", line 241, in load_gt_vid_name_for_query assert "vid_name" in self.query_data[0] AssertionError
I notice that the "data/tvr_val_release.jsonl" a different format has than " data/tvr_test_public_release.jsonl" So I suspect this is the culprit and needs to be handled differently in the inference code.
P.S. kudos for all the code and clear documentation provided in this repository.