Open sarthakg2002 opened 1 month ago
I would have to check the error message later. Is it possible that your workspace is corrupted (i.e., created but with no image present)? Try removing the entire workspace and starting again.
For the video editing demo, you can use the layered
mode in the interactive demo. If you are running Cutie as a script, you would have to implement the layering by yourself but it should be fairly straightforward. The mask is used to separate the foreground from the background, and the layers are rendered in this order background->insertion layer->foreground.
Got it thanks!
I was looking at the code and couldn't find where the mask of the image is calculated. Everywhere the mask is being loaded. Does this project assume the mask to be provided?
If so, which model was used to get masks for your dataset? I was thinking of using the model Sam for this purpose. Will that work?
I was following this notebook: https://colab.research.google.com/drive/1yo43XTbjxuWA7XgCUO9qxAi7wBI6HzvP?usp=sharing&authuser=1 but it doesn't do layering so i took the initial setup from here and the rest i got from the main_controller.py. However, to use the overlay_layer_torch() function i needed to use the Resourcemanager class. When i pass the config variable (cfg), it gives me an error for no key images in cfg. How can i initialize it for images and other keys as well (i'm guessing video and max_overall_size keys will also give error).
Hi, the first masks are always given in the VOS setting. You can indeed use SAM to create those masks.
For editing, I think it's easier to copy the masking logic and create your own function.
Sorry that I'm quite busy these days and cannot provide an example for now.
Hey, Can you please let me know about the initialization for the ResourceManager
class. I'm really having a hard time figuring that part out.
However, to use the overlay_layer_torch() function i needed to use the Resourcemanager class
I don't think it is needed. In any case, the logic is quite straightforward with just 10 lines of code. I don't think you would need to go through the internal logic in the controller (which is designed for the GUI).
How do i get the variables prob
and target_objects
. Is layer
just the image to be be inserted between the foreground and background which is converted to torch value.
prob
is our prediction before argmax. target_objects
is a list of objects that should be used in masking. Yes.
But how do i get those values. For example if i have 2 torch images img
and overlay
which i got using imread
and then using image_to_torch
, how do i get the values for those variables?
To be more specific I'm trying to add an object (image) into a video and track it using pose estimation at a specific coordinate. Here is my current code for handling a single frame:
import torch
import numpy as np
def image_to_torch(frame: np.ndarray):
device = 'cuda'
frame = frame.transpose((2, 0, 1))
frame = torch.from_numpy(frame).float().to(device, non_blocking=True) / 255
return frame
def overlay_image_alpha(img, img_overlay, x, y, alpha_mask):
y1, y2 = max(0, y), min(img.shape[0], y + img_overlay.shape[0])
x1, x2 = max(0, x), min(img.shape[1], x + img_overlay.shape[1])
y1o, y2o = max(0, -y), min(img_overlay.shape[0], img.shape[0] - y)
x1o, x2o = max(0, -x), min(img_overlay.shape[1], img.shape[1] - x)
if y1 >= y2 or x1 >= x2 or y1o >= y2o or x1o >= x2o:
return img
overlay_slice = img_overlay[y1o:y2o, x1o:x2o, :]
mask_slice = alpha_mask[y1o:y2o, x1o:x2o]
img_slice = img[y1:y2, x1:x2, :]
alpha = mask_slice[..., None] / 255.0
img[y1:y2, x1:x2, :] = (1.0 - alpha) * img_slice + alpha * overlay_slice[..., :3]
return img
def overlay_image(img, img_overlay, x, y, alpha_mask):
white_background = np.ones_like(img) * 255
img_with_overlay = overlay_image_alpha(white_background, img_overlay, x, y, alpha_mask)
img_with_overlay = image_to_torch(img_with_overlay).permute(1, 2, 0)
# obj_mask = torch.zeros_like(torch.tensor(1, dtype=torch.int8)).unsqueeze(2)
layer_alpha = img_with_overlay[:, :, 3].unsqueeze(2)
layer_rgb = img_with_overlay[:, :, :3]
background_alpha = (1 - obj_mask) * (1 - layer_alpha)
img = image_to_torch(img)
img_final = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)
img_final= (img_final* 255).byte().cpu().numpy()
return img_final
Not sure how to get obj_mask
.
Where are you using Cutie? The mask comes from there.
Here is the updated code. i had to change the layer_alpha
line by putting index 2 instead of 3 but im getting error that size of tensor should march :
def overlay_image(img, img_overlay, x, y, alpha_mask):
white_background = np.ones_like(img) * 255
img_with_overlay = overlay_image_alpha(white_background, img_overlay, x, y, alpha_mask)
img_with_overlay = image_to_torch(img_with_overlay).permute(1, 2, 0)
cutie = get_default_model()
processor = InferenceCore(cutie, cfg=cutie.cfg)
pil_image = img[:, :, ::-1]
pil_image = Image.fromarray(pil_image)
palette = [(0, 0, 0), (255, 255, 255)]
indexed_image = pil_image.convert('P', palette=palette)
mask = indexed_image.point(lambda p: 0 if p == 0 else 1)
objects = np.unique(np.array(mask))
objects = objects[objects != 0].tolist()
mask = torch.from_numpy(np.array(mask)).cuda()
image = to_tensor(pil_image).cuda().float()
prob = processor.step(image, mask, objects=objects)
obj_mask = prob[np.array(objects, dtype=np.int32)].sum(0).unsqueeze(2)
layer_alpha = img_with_overlay[:, :, 2].unsqueeze(2)
layer_rgb = img_with_overlay[:, :, :3]
background_alpha = (1 - obj_mask) * (1 - layer_alpha)
img = image_to_torch(img).permute(2, 0, 1)
img_overlay = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)
img_overlay = (img_overlay * 255).byte().cpu().numpy()
return img_overlay
Error:
img_overlay = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)
~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (1280) must match the size of tensor b (720) at non-singleton dimension 2
It would not work if you changed it from 3 to 2. You need a transparent PNG image as the layer image. Also, your layer image might not have the same dimensions as the input. You would need to resize/pad it.
What am i doing wrong. I followed the installation guide.
Also i want to achieve the adding feature from the demo where the image was added to the dance video. Could you guide me to that part of the code cause from the scripting_demo_add_del_objects.py, its not clear where the video editing is being done (only images and not dealing with frames). Is there somewhere i could find the code to generate similar results?