z-x-yang / Segment-and-Track-Anything

An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.
GNU Affero General Public License v3.0
2.75k stars 332 forks source link

A series questions about the model #122

Closed WIFIwifi8966 closed 8 months ago

WIFIwifi8966 commented 8 months ago

That's an excellent job! While I have few questions. First, the project's progression from AOT to DeAOT to SAMtrack suggests that for panoramic segmentation, could AOT perform a similiar role SAMtrack, excluding the ability of clicking as a prompt? Second, in the paper of SAMTrack, we could see the model is categorized as the VOS model. However, based on the examples presented, it seems the AOT, DeAOT, SAMTrack model could accomplish tasks simliar to MOTS, so could we conclude that this model can slove a wide range range of tasks such as VOD/VSS/VOS/MOST? Third, the model of SAM is vit_b in the readme. Have you experimented with changing the model, such as using vit_h for testing? (Alternatively, could you guide me on where to change the model? I couldn't locate that section.) Thank you very much!

z-x-yang commented 8 months ago

It's good to hear you are interested in our work.

About your questions:

  1. AOT/DeAOT is a semi-supervised VOS method that takes reference images with object masks as input. Prompts, like clicking, are firstly pre-processed into object masks, which will be forwarded to AOT/DeAOT.
  2. Yes. SAM-Track can handle various video tasks, including VOD/VSS/VOS/MOST.
  3. To change the SAM backbone, modify the config in here.
WIFIwifi8966 commented 8 months ago

It's good to hear you are interested in our work.

About your questions:

  1. AOT/DeAOT is a semi-supervised VOS method that takes reference images with object masks as input. Prompts, like clicking, are firstly pre-processed into object masks, which will be forwarded to AOT/DeAOT.
  2. Yes. SAM-Track can handle various video tasks, including VOD/VSS/VOS/MOST.
  3. To change the SAM backbone, modify the config in here.

Thank you very much!!