mikel-brostrom / boxmot

BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
GNU Affero General Public License v3.0
6.56k stars 1.7k forks source link

MASA: Matching Anything By Segmenting Anything (CVPR24) #1474

Open johnnynunez opened 3 months ago

johnnynunez commented 3 months ago

Search before asking

Description

https://github.com/siyuanliii/masa

Use case

No response

Are you willing to submit a PR?

mikel-brostrom commented 3 months ago

Wow, this looks very promising

mikel-brostrom commented 3 months ago

From MASA:

"Additionally, our learned detection head speeds up the original SAM dense uniform point proposals for segmenting everything by over tenfold, crucial for tracking applications."

"We treat the SAM outputs as dense object region proposals and learn to match those regions from a vast image collection. We further design a universal MAS encoder: A heavy ViT-based backbone for feature extraction."

From SAM:

"Given a precomputed image embedding, the prompt encoder and mask decoder run in a web browser, on CPU, in ∼50ms."

SAM uses ViT as image embedder, from ViT:

"ViT-B (Base Model): Inference times on high-end GPUs (such as NVIDIA V100 or A100) are typically around 20-40 milliseconds per image."

ViT-L (Large Model): Inference times are generally longer, around 50-80 milliseconds per image, depending on the exact setup and image resolution.

ViT-H (Huge Model): Inference times can exceed 100 milliseconds per image due to the increased model complexity and size.

This would require to get ViT working here as embedder to start with

mikel-brostrom commented 3 months ago

This:

This would require to get ViT working here as embedder to start with

may not be a problem anymore 😄, given that Ultralytics has a ViT encoder

github-actions[bot] commented 2 months ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

mikel-brostrom commented 1 week ago

Add model under boxmot/appearance/backbones. And model weights url (https://huggingface.co/dereksiyuanli/masa/resolve/main/gdino_masa.pth). Adapt: https://github.com/siyuanliii/masa/blob/main/masa/apis/masa_inference.py

johnnynunez commented 1 week ago

so is it compatible now your repo?

mikel-brostrom commented 1 week ago

I have been investigating this thoroughly today. The whole architecture build using mmdet needs to be ported to pytorch. This is no easy feat due to many custom implementations and optimizations. I don't have the time for such a research project on my free time at the moment

rolson24 commented 1 week ago

I was working on this a few weeks ago to integrate with huggingface, but never got around to finishing it. Would be happy to share my code from it.

On Mon, Sep 9, 2024 at 12:26 PM Mike @.***> wrote:

I have been investigating this throughly today. The whole architecture build using mmdet needs to be ported to pytorch. This is no easy feat due to many custom implementations and optimizations

— Reply to this email directly, view it on GitHub https://github.com/mikel-brostrom/boxmot/issues/1474#issuecomment-2338558512, or unsubscribe https://github.com/notifications/unsubscribe-auth/AX2EJPHM5NNB5OT7BD6FEWTZVXD4VAVCNFSM6AAAAABJEIK5Q6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZYGU2TQNJRGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mikel-brostrom commented 1 week ago

I was working on this a few weeks ago to integrate with huggingface, but never got around to finishing it. Would be happy to share my code from it.

Hey @rolson24! Hope everything is fine. That would be awesome. So just the architecture part?

microchila commented 1 week ago

MASA is very powerful, will it be supported in the future?

mikel-brostrom commented 1 week ago

MASA is very powerful, will it be supported in the future?

It will take some time due to the need of an architecture port from mmdet to pure pytorch. But it is ongoing