hkchengrex / Cutie

[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
https://hkchengrex.com/Cutie/
MIT License
571 stars 60 forks source link

Issue while running the interactive demo + Video Editing Documentation #67

Open sarthakg2002 opened 1 month ago

sarthakg2002 commented 1 month ago

Screenshot 2024-05-16 091226 What am i doing wrong. I followed the installation guide.

Also i want to achieve the adding feature from the demo where the image was added to the dance video. Could you guide me to that part of the code cause from the scripting_demo_add_del_objects.py, its not clear where the video editing is being done (only images and not dealing with frames). Is there somewhere i could find the code to generate similar results?

hkchengrex commented 1 month ago

I would have to check the error message later. Is it possible that your workspace is corrupted (i.e., created but with no image present)? Try removing the entire workspace and starting again.

For the video editing demo, you can use the layered mode in the interactive demo. If you are running Cutie as a script, you would have to implement the layering by yourself but it should be fairly straightforward. The mask is used to separate the foreground from the background, and the layers are rendered in this order background->insertion layer->foreground.

sarthakg2002 commented 1 month ago

Got it thanks!

I was looking at the code and couldn't find where the mask of the image is calculated. Everywhere the mask is being loaded. Does this project assume the mask to be provided?

If so, which model was used to get masks for your dataset? I was thinking of using the model Sam for this purpose. Will that work?

sarthakg2002 commented 1 month ago

I was following this notebook: https://colab.research.google.com/drive/1yo43XTbjxuWA7XgCUO9qxAi7wBI6HzvP?usp=sharing&authuser=1 but it doesn't do layering so i took the initial setup from here and the rest i got from the main_controller.py. However, to use the overlay_layer_torch() function i needed to use the Resourcemanager class. When i pass the config variable (cfg), it gives me an error for no key images in cfg. How can i initialize it for images and other keys as well (i'm guessing video and max_overall_size keys will also give error).

hkchengrex commented 1 month ago

Hi, the first masks are always given in the VOS setting. You can indeed use SAM to create those masks.

For editing, I think it's easier to copy the masking logic and create your own function.

Sorry that I'm quite busy these days and cannot provide an example for now.

sarthakg2002 commented 1 month ago

Hey, Can you please let me know about the initialization for the ResourceManager class. I'm really having a hard time figuring that part out.

hkchengrex commented 1 month ago

However, to use the overlay_layer_torch() function i needed to use the Resourcemanager class

https://github.com/hkchengrex/Cutie/blob/b8930f0b36888f933d353896e28b1b89e2fbfe86/gui/interactive_utils.py#L195-L215

I don't think it is needed. In any case, the logic is quite straightforward with just 10 lines of code. I don't think you would need to go through the internal logic in the controller (which is designed for the GUI).

sarthakg2002 commented 4 weeks ago

How do i get the variables prob and target_objects. Is layer just the image to be be inserted between the foreground and background which is converted to torch value.

hkchengrex commented 4 weeks ago

prob is our prediction before argmax. target_objects is a list of objects that should be used in masking. Yes.

sarthakg2002 commented 4 weeks ago

But how do i get those values. For example if i have 2 torch images imgand overlaywhich i got using imread and then using image_to_torch, how do i get the values for those variables?

sarthakg2002 commented 4 weeks ago

To be more specific I'm trying to add an object (image) into a video and track it using pose estimation at a specific coordinate. Here is my current code for handling a single frame:

import torch
import numpy as np

def image_to_torch(frame: np.ndarray):
    device = 'cuda'
    frame = frame.transpose((2, 0, 1))
    frame = torch.from_numpy(frame).float().to(device, non_blocking=True) / 255
    return frame

def overlay_image_alpha(img, img_overlay, x, y, alpha_mask):
    y1, y2 = max(0, y), min(img.shape[0], y + img_overlay.shape[0])
    x1, x2 = max(0, x), min(img.shape[1], x + img_overlay.shape[1])

    y1o, y2o = max(0, -y), min(img_overlay.shape[0], img.shape[0] - y)
    x1o, x2o = max(0, -x), min(img_overlay.shape[1], img.shape[1] - x)

    if y1 >= y2 or x1 >= x2 or y1o >= y2o or x1o >= x2o:
        return img

    overlay_slice = img_overlay[y1o:y2o, x1o:x2o, :]
    mask_slice = alpha_mask[y1o:y2o, x1o:x2o]

    img_slice = img[y1:y2, x1:x2, :]

    alpha = mask_slice[..., None] / 255.0
    img[y1:y2, x1:x2, :] = (1.0 - alpha) * img_slice + alpha * overlay_slice[..., :3]

    return img

def overlay_image(img, img_overlay, x, y, alpha_mask):
    white_background = np.ones_like(img) * 255
    img_with_overlay = overlay_image_alpha(white_background, img_overlay, x, y, alpha_mask)
    img_with_overlay = image_to_torch(img_with_overlay).permute(1, 2, 0)
    # obj_mask = torch.zeros_like(torch.tensor(1, dtype=torch.int8)).unsqueeze(2)
    layer_alpha = img_with_overlay[:, :, 3].unsqueeze(2)
    layer_rgb = img_with_overlay[:, :, :3]
    background_alpha = (1 - obj_mask) * (1 - layer_alpha)
    img = image_to_torch(img)
    img_final = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)
    img_final= (img_final* 255).byte().cpu().numpy()
    return img_final

Not sure how to get obj_mask.

hkchengrex commented 3 weeks ago

Where are you using Cutie? The mask comes from there.

sarthakg2002 commented 3 weeks ago

Here is the updated code. i had to change the layer_alphaline by putting index 2 instead of 3 but im getting error that size of tensor should march :

def overlay_image(img, img_overlay, x, y, alpha_mask):
    white_background = np.ones_like(img) * 255
    img_with_overlay = overlay_image_alpha(white_background, img_overlay, x, y, alpha_mask)
    img_with_overlay = image_to_torch(img_with_overlay).permute(1, 2, 0)

    cutie = get_default_model()
    processor = InferenceCore(cutie, cfg=cutie.cfg)
    pil_image = img[:, :, ::-1]
    pil_image = Image.fromarray(pil_image)
    palette = [(0, 0, 0), (255, 255, 255)]
    indexed_image = pil_image.convert('P', palette=palette)
    mask = indexed_image.point(lambda p: 0 if p == 0 else 1)
    objects = np.unique(np.array(mask))
    objects = objects[objects != 0].tolist()
    mask = torch.from_numpy(np.array(mask)).cuda()
    image = to_tensor(pil_image).cuda().float()
    prob = processor.step(image, mask, objects=objects)

    obj_mask = prob[np.array(objects, dtype=np.int32)].sum(0).unsqueeze(2)
    layer_alpha = img_with_overlay[:, :, 2].unsqueeze(2)
    layer_rgb = img_with_overlay[:, :, :3]
    background_alpha = (1 - obj_mask) * (1 - layer_alpha)
    img = image_to_torch(img).permute(2, 0, 1)
    img_overlay = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)

    img_overlay = (img_overlay * 255).byte().cpu().numpy()
    return img_overlay

Error:

 img_overlay = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)
                       ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (1280) must match the size of tensor b (720) at non-singleton dimension 2
hkchengrex commented 3 weeks ago

It would not work if you changed it from 3 to 2. You need a transparent PNG image as the layer image. Also, your layer image might not have the same dimensions as the input. You would need to resize/pad it.