Unwanted objects getting segmented and tracked (without any human/mask input)

hkchengrex / Cutie

[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation

https://hkchengrex.com/Cutie/

MIT License

579 stars 60 forks source link

Unwanted objects getting segmented and tracked (without any human/mask input) #55

Closed mukulkhanna closed 3 months ago

mukulkhanna commented 3 months ago

Hello authors,

Thank you for creating this wonderful tool and for open sourcing the repository.

I am facing an issue through the interactive_demo script wherein unwanted objects – that I did not provide any mask for (not even though adding clicks) are getting segmented out of nowhere and getting tracked.

Below are the input and output videos.

https://github.com/hkchengrex/Cutie/assets/24846546/5bf06bac-16bb-41e0-a11c-2542726e1c13

https://github.com/hkchengrex/Cutie/assets/24846546/09dc585f-718f-4474-817b-c2342ce5ea42

To be specific, I pass the video path using --video and do not provide any masks for the hand. Then, I click "forward propagate".

I was expecting nothing to get detected and tracked, but the hand is getting detected and tracked.

As I understand it, the model only tracks objects that the user provides a mask for. Is my understanding correct? Please clarify.

Thank you!

mukulkhanna commented 3 months ago

Some more context:

In the above example, I did not add any masks and still the hand was getting predicted. In a different example, I also tried to add some masks for the pots in the video (using RITM through left and right clicks; visible in the first frame of the video). Even in that case, I am observing that the model somehow predicts a mask for the hand (in turquoise blue) and assigns it an instance ID of x+1 where x is the number of objects I have manually added through clicks. Here is the output visualization in that case:

https://github.com/hkchengrex/Cutie/assets/24846546/16a06e54-96cd-4b2e-b592-1b560d364597

I am facing this issue on both Ubuntu and Mac.

hkchengrex commented 3 months ago

The model is designed to only track objects that users are specified -- that is correct.

In the interactive demo, when the user didn't specify any objects (thus implicitly specifying an empty mask for all the objects), the model is reading from a memory bank with only empty masks. This is out-of-distribution as the model has never been trained with memory that has no masks.

If we know that an object does not appear on the first frame, we can create its corresponding memory bank later to avoid this issue (implemented in say the scripting demo with adding/removing objects). In the interactive demo, all memory banks are created at program startup due to UX constraints. The workaround is to label all objects (on the same or separate frames) and add them to the permanent memory before propagation.

And it would always help to specify the exact number of objects that are needed.

hkchengrex commented 3 months ago

Would love to know why this is a problem though -- if an object needs to be tracked, it needs to be labeled; if not the number of objects should be decreased.

mukulkhanna commented 3 months ago

Thanks for that explanation @hkchengrex! That gives a lot of clarity.

if an object needs to be tracked, it needs to be labeled;

In my use case, I wanted to add objects (and their labels) one by one – as they appear in the video – without knowing in advance how many objects there might be.

Therefore, I will look into adding objects (and corresponding memory banks) one by one – instead of all at once, like in the interactive demo.

mukulkhanna commented 3 months ago

What do you think would be the better route to take:

Initialize memory bank for each object one-at-at-time – as they appear, or
to pass only the object IDs of the objects that have been added (if any) to the InferenceCore.step() function?

hkchengrex commented 3 months ago

The memory bank of an object is first created when the user passes that object ID into the .step function for the first time. So these two are the same.

hkchengrex commented 3 months ago

If you replace mask reading with your mask input in this script https://github.com/hkchengrex/Cutie/blob/main/scripting_demo_add_del_objects.py, it should work as expected

mukulkhanna commented 3 months ago

Thank you, I will try this out.