hkchengrex / XMem

[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
https://hkchengrex.com/XMem/
MIT License
1.72k stars 191 forks source link

About object groups #120

Closed VGrondin closed 1 year ago

VGrondin commented 1 year ago

Hello, thank you for this code it work really well. I am trying to implement some modification to it to handle multiple objects, some of which are not in the first frame mask. As I understand you use groups to handle new objects, but I don't understand why groups are needed? Is it only for memory removal purpose?

Also, when the affinity matrix is computed, why is it sliced by groups? (from memory_manager.py line 137)

# compute affinity group by group as later groups only have a subset of keys
for gi in range(1, num_groups):
    affinity_one_group = do_softmax(similarity[:, -self.work_mem.get_v_size(gi):], 
        top_k=self.top_k, inplace=(gi==num_groups-1))
    affinity.append(affinity_one_group)

If you could clarify these points it would help me a lot! Best

hkchengrex commented 1 year ago
  1. All objects in the same group share the same affinity matrix.
  2. To achieve this, all values in the same group must have the same number of elements. If two objects do not first appear on the same frame, they do not have the same number of memory elements and cannot be placed in the same group.
  3. The "global" affinity matrix is first computed for all the keys, then sliced to match the size of each group. The first group is always the largest and does not require slicing.
VGrondin commented 1 year ago

Thanks for the quick answer! I will think about it and if I don't have further questions will close the issue.

hkchengrex commented 1 year ago

The implementation here might be more clear: https://github.com/hkchengrex/Tracking-Anything-with-DEVA. We call the groups "buckets" in that implementation.

VGrondin commented 1 year ago

Thanks for the update! The paper and results are very interesting. I've been searching for the keyword "bucket" in the code but can't find it. Can you guide me where in the code you perform Propagation and Consensus merging as XMem itself cannot segment new objects that appear in the scene? I've been looking in group_module.py but I'm not sure if it's where propagation and consensus is done.

hkchengrex commented 1 year ago

@VGrondin feel free to open an issue there so that we can track it.