z-x-yang / CFBI

The official implementation of CFBI(+): Collaborative Video Object Segmentation by (Multi-scale) Foreground-Background Integration.
BSD 3-Clause "New" or "Revised" License
322 stars 43 forks source link

About sequence length during inference #53

Closed ruoqi77 closed 2 years ago

ruoqi77 commented 2 years ago

Hi, I'm confused about some details during inference. I see in your code 'global_matching_for_eval' , your input 'all_reference_embeddings' is a list, do you use more than one frames as reference for global matching during inference (in your network pipeline, only the first frame is used as the reference frame)? And if so, how do you select the frames idx to use?

z-x-yang commented 2 years ago

A good question.

In YouTube-VOS, not all the objects firstly appear in the first frame for some sequences. In other words, these sequences have more than one reference frame.

More details can be found in #6 .

ruoqi77 commented 2 years ago

Thanks for your quick answer!So it means, multi reference frames are only for the videos with new objects added in the following sequences like YouTube-VOS, for other case (like DAVIS), the reference frame is always the first one, right?

z-x-yang commented 2 years ago

Exactly.

ruoqi77 commented 2 years ago

One more detail, does the channel of variable 'reference_labels' (i.e. 'obj_nums') contain background? For example, if there is only one object, the channel of 'reference_labels' is 1 or 2?

z-x-yang commented 2 years ago
  1. The first channel always indicates the background.