storyicon / comfyui_segment_anything

Based on GroundingDino and SAM, use semantic strings to segment any element in an image. The comfyui version of sd-webui-segment-anything.
Apache License 2.0
725 stars 83 forks source link

GroundingDinoSAMSegment MASK Output issue with Video/ImageBatches #7

Closed dnl13 closed 1 year ago

dnl13 commented 1 year ago

Hey, first of all, thank you for this very nice Nodes! But there is an issue with the MASK output of the GroundingDinoSAMSegment Node. When i try to feed a Vidoe input the IMAGE output looks correct, but the MASK output is just one large image with all Frames sticking verticaly instead of beeing an Image-Batch like the ImagePreview shows. ~I am guessing there is something "wrong" in the mask_decoder_hq.py at line 135-152 ?~

issue

dnl13 commented 1 year ago

ok figuered out

in node.py 
...
def split_image_mask(image):
    image_rgb = image.convert("RGB")
    image_rgb = np.array(image_rgb).astype(np.float32) / 255.0
    image_rgb = torch.from_numpy(image_rgb)[None,]
    if 'A' in image.getbands():
        mask = np.array(image.getchannel('A')).astype(np.float32) / 255.0
        mask = 1. - torch.from_numpy(mask)
    else:
        mask = torch.zeros((64, 64), dtype=torch.float32, device="cpu")
    return (image_rgb, mask)

mask = 1. - torch.from_numpy(mask) inverts the mask and also stacks the mask vertical

changed it to

mask = torch.from_numpy(mask)[None,]

if you still want the mask inverted ( but i dont know why ):

mask = 1. - torch.from_numpy(mask)[None,]

resolved the issue and also makes inverting masks obsolete

fix

will send pull request...

orzlenmo commented 1 year ago

Thank you, you solved my problem