Closed ruoqi77 closed 2 years ago
A good question.
In YouTube-VOS, not all the objects firstly appear in the first frame for some sequences. In other words, these sequences have more than one reference frame.
More details can be found in #6 .
Thanks for your quick answer!So it means, multi reference frames are only for the videos with new objects added in the following sequences like YouTube-VOS, for other case (like DAVIS), the reference frame is always the first one, right?
Exactly.
One more detail, does the channel of variable 'reference_labels' (i.e. 'obj_nums') contain background? For example, if there is only one object, the channel of 'reference_labels' is 1 or 2?
Hi, I'm confused about some details during inference. I see in your code 'global_matching_for_eval' , your input 'all_reference_embeddings' is a list, do you use more than one frames as reference for global matching during inference (in your network pipeline, only the first frame is used as the reference frame)? And if so, how do you select the frames idx to use?