the inference code of tracking like Figure 3. (b) in paper

siyuanliii / masa

Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything

https://matchinganything.github.io

Apache License 2.0

955 stars 62 forks source link

the inference code of tracking like Figure 3. (b) in paper #10

Closed jru001 closed 3 months ago

jru001 commented 3 months ago

Hi Siyuan, Thanks for your great work and released codes! I'm wondering how to track by combining the detection head (of MASA Adapter) and SAM directly, just like the Figure 3. (b) in paper. Will it be released later?

siyuanliii commented 3 months ago

hi thanks, SAM's prediction is not consistent across video frames hence leading to heavy flickering due to missing detections. Therefore, we have not provided such a demo yet. We will try to reduce the flickering effect first. However, it is straightforward and simple to test it for yourself first. You can replace the detection bounding boxes with the output of masa trained detection head and run.

jru001 commented 3 months ago

Thanks for your reply, looking forward to your new achievements.

ddshan commented 1 month ago

Hi Siyuan,

Congrats on this amazing work! It is very good to use for tracking!

I am interested in associating SAM masks between frames and still have questions about it.

Regarding the suggestion you gave above -- "replace the detection bounding boxes with the output of masa-trained detection head and run", could you give some quick guidance on how to use the masa-trained detection head? Currently, the demos use grounding-DINO/Yolox/Co-DETR to find objects. What changes to make so as to call the masa-trained detector?