Thank you for your great work and open-source code.
I have an issue with the GT saliency scores (only localized 2-sec clips), can you please explain briefly?
besides, how Predicted saliency scores (for all 2-sec clip) corresponds to the previous term?
Thanks!
Best,
Kevin
Build models...
Loading feature extractors...
Loading CLIP models
Loading trained Moment-DETR model...
Run prediction...
------------------------------idx0
>> query: Chef makes pizza and cuts it up.
>> video_path: run_on_video/example/RoripwjYFp8_60.0_210.0.mp4
>> GT moments: [[106, 122]]
>> Predicted moments ([start_in_seconds, end_in_seconds, score]): [
[49.967, 64.9129, 0.9421],
[66.4396, 81.0731, 0.9271],
[105.9434, 122.0372, 0.9234],
[93.2057, 103.3713, 0.2222],
...,
[45.3834, 52.2183, 0.0005]
]
>> GT saliency scores (only localized 2-sec clips): # what it means?
[[2, 3, 3], [2, 3, 3], ...]
>> Predicted saliency scores (for all 2-sec clip): # how this correspond to the GT saliency scores?
[-0.9258, -0.8115, -0.7598, ..., 0.0739, 0.1068]
Thank you for your great work and open-source code.
I have an issue with the GT saliency scores (only localized 2-sec clips), can you please explain briefly? besides, how Predicted saliency scores (for all 2-sec clip) corresponds to the previous term?
Thanks!
Best, Kevin