roboflow / multimodal-maestro

streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, Phi-3.5 Vision
Apache License 2.0
1.21k stars 89 forks source link

Improve segmentation step to get single label and single marker for each object #16

Open ramsrigouthamg opened 9 months ago

ramsrigouthamg commented 9 months ago

Search before asking

Description

I am trying to achieve segmentation of objects such that each object has only one label and clear segmentation boundary defined. At the moment in the post-processing refiner step of the tutorial (Colab) notebook in the repo, the hard-coded 0.02 value isn’t perfect for most images and misses correct segmentation clusters. So misses most individual objects or they are clustered with the background.

The refiner function does 4 different tasks at once (hole filling, minimum area , max …) Good to isolate or please suggest a better way to isolate individual objects and their segmentation pixels perfectly.

Use case

No response

Additional

No response

Are you willing to submit a PR?

SkalskiP commented 9 months ago

Hi, @ramsrigouthamg! 👋🏻 Thank you for your interest in our project. You can already run the following functions independently:

ramsrigouthamg commented 9 months ago

Thanks @SkalskiP Is there support/potential to include support for https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once in this library? Basically, I was trying to get better segmentation masks instead of traditional SAM and merging which is erroneous.

SkalskiP commented 9 months ago

I`d love to. Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V paper used SEEM as well.

But I want Maestro to me easy to install. I don't want to force people to go through this installation process when installing Maestro. So, if we would integrate it, we need SEEM version that is easily installable.

ramsrigouthamg commented 9 months ago

Understood thanks for the quick response!

SkalskiP commented 9 months ago

Alternatively, we can make it pluggable so that if someone installs it and goes through that pain they could use it. Do you have experience with SEEM?