siyuanliii / masa

Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything
https://matchinganything.github.io
Apache License 2.0
964 stars 62 forks source link

question on pipeline for training #4

Closed nikky4D closed 3 months ago

nikky4D commented 3 months ago

In the paper, you utilize MASA with SAM, and detector models like Grounding DINO. I did not understand the inference pipeline with detectors and how to apply it for other domains.

Is the masa module first trained with SAM, then the detector head removed and then masa is utilized with Grounding DINO as a feature extractor? If masa is to be applied for other domains without exhaustive SAM segmentations, can masa be used with other segmentation modules? how much data does masa require to develop good features?

siyuanliii commented 3 months ago

Thanks for the question! Yes, the detector head can be removed after training. For masa with SAM, the detection branch may be useful as a fast everything bbox prompt provider. For masa-grounding-dino, yes, we reuse the grounding dino's backbone features. When applying masa to other domains, you can always call SAM to group those raw pixels into exhaustive segments. In our paper, we use 500K images.

nikky4D commented 3 months ago

Thanks for the info.