hustvl / ViTMatte

[Information Fusion (Vol.103, Mar. '24)] Boosting Image Matting with Pretrained Plain Vision Transformers
MIT License
339 stars 33 forks source link

Softmax outputs? #20

Open rb-synth opened 10 months ago

rb-synth commented 10 months ago

I have an image with multiple objects and background. Is there anyway to produce mattes such that the sum in any given pixel is equal to one? In other words, to consider the objects at the same time rather than individually? When they are considered independently, I sometimes end up with a blank region between two touching objects, which gives the impression that there is background between the two objects even though I know this is not the case.

Any ideas what to do here?

rb-synth commented 10 months ago

For example, I take an image, get masks (with SAM) and get mattes. Then I visualise alpha_1 + alpha_2 != 1:

Image

cats-and-dogs

masks

mask_18 mask_27

mattes

alpha_18 alpha_27

areas where sum != 1

diff

JingfengYao commented 10 months ago

For me, I may try to generate only one trimap for both of them instead of two separate ones. You can achieve this easily by our matte-anything.

rb-synth commented 10 months ago

Hi, matte-anything appears to be segment anything, followed by ViTMatte, so how would that be different from this example? All the matte-anything examples give binary masks, but I need multi-instance matting – is this possible with matte-anything?

rb-synth commented 10 months ago

To be clear, in this toy example I couldn't create just one trimap since I have three classes – dog, cat, background.

JingfengYao commented 10 months ago

Do you mean something like this? 1700067377392 1700067439937

rb-synth commented 10 months ago

No, this is still binary. It's either:

  1. FG: cat A, BG: everything else,
  2. FG: cats A and B, BG: everything else, or
  3. FG: cat B, BG: everything else.

I would want it to matte both of the cats independently, but in such a way that at the border between the two cats the sum of the mattes == 1.

JingfengYao commented 10 months ago

I see. Interesting perspective. However, it seems difficult to the matting models like ViTMatte. Since the training framework is different. From my own perspective, it is also difficult to say which alpha (for example 0.5 or 0.6)is the absolute correct for the edges of the object.