Closed celestialxevermore closed 1 year ago
Dear author, I hope you will make good Thanksgiving days!
I am really fascinated by your paper, and feel thankful for your code open source also.
I have a question about what the 'idx' is exactly means.
for i,(image, text, idx) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
I have treated MSVD and MSRVTT dataset only, which contains
text_ids, text_attention_mask, text_token_type_ids, and for vision modality, raw image and image_mask.
but I cannot make sense of what the 'idx' means in MSCOCO dataset.
I've guess that the 'idx' may means the index of video and text pair, but
I cannot find the meaning exactly.
thank you for reading my questions.
Dear author, I hope you will make good Thanksgiving days!
I am really fascinated by your paper, and feel thankful for your code open source also.
I have a question about what the 'idx' is exactly means.
for i,(image, text, idx) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
I have treated MSVD and MSRVTT dataset only, which contains
text_ids, text_attention_mask, text_token_type_ids, and for vision modality, raw image and image_mask.
but I cannot make sense of what the 'idx' means in MSCOCO dataset.
I've guess that the 'idx' may means the index of video and text pair, but
I cannot find the meaning exactly.
thank you for reading my questions.