zhang-tao-whu / DVIS

DVIS: Decoupled Video Instance Segmentation Framework
MIT License
124 stars 7 forks source link

about match_embds function #33

Open xiexiaozheng opened 2 months ago

xiexiaozheng commented 2 months ago

@zhang-tao-whu hello, as for the tracker part, I have a question about the function of match_embds, in this function, why is the cosine similarity calculated from only one sample in the batch, as shown in the following code? `

def match_embds(self, ref_embds, cur_embds):
    #  embeds (q, b, c)
    ref_embds, cur_embds = ref_embds.detach()[:, 0, :], cur_embds.detach()[:, 0, :] # only one sample in a batch
    ref_embds = ref_embds / (ref_embds.norm(dim=1)[:, None] + 1e-6)
    cur_embds = cur_embds / (cur_embds.norm(dim=1)[:, None] + 1e-6)
    cos_sim = torch.mm(ref_embds, cur_embds.transpose(0, 1))
    C = 1 - cos_sim

    C = C.cpu()
    C = torch.where(torch.isnan(C), torch.full_like(C, 0), C)

    indices = linear_sum_assignment(C.transpose(0, 1))
    indices = indices[1]
    return indices

`