hkchengrex / Tracking-Anything-with-DEVA

[ICCV 2023] Tracking Anything with Decoupled Video Segmentation
https://hkchengrex.com/Tracking-Anything-with-DEVA/
Other
1.25k stars 129 forks source link

Extract the ROI of the masked regions #31

Closed vanamagautam24 closed 1 year ago

vanamagautam24 commented 1 year ago

Hello,

I have been able to apply masks to humans in a video. Using the Grounding DINO model, I am able to extract the ROI (region of interest) of all the humans in the frame. However, the ROI has some background noise, which is not too bad. Still, I was wondering if there was a way to extract only the masked humans.

Currently, when I try to visualize the output of the mask, I don't see any humans. The output is mostly an array of zeros.

For example, if there are 5 humans in the frame, I would like the ROI of only the masked humans and nothing else. If there is a person with an orange mask, I would like to extract only that person and the orange mask.

I want to perform pose estimation on the segmented/masked humans instead of performing pose estimation on the entire frame. It would defeat the purpose of segmenting and reduce the accuracy as well.

Please let me know if this is possible or not. Thanks a lot for your time.

hkchengrex commented 1 year ago

We do produce an output mask every frame. I don't see why they couldn't be used to crop out the humans.

vanamagautam24 commented 1 year ago

Hey, Thanks for getting back to me. Like you said, I'm able to extract the masks alone but the problem is it only masks one or two humans, and the others are ignored, but the humans in the video have no problem getting masked. They are accurate but when I try to extract the masks I don't get the masks for all the humans.

Maybe the code snippet below might help you understand better and guide me where I'm going wrong.

frame_info = process_frame_text(deva, gd_model, sam_model, 'null.png', result_saver, ti, image_np=frame)
if frame_info and frame_info.mask is not None:
      mask = frame_info.mask.cpu().numpy().astype(np.uint8)
      mask_resized = cv2.resize(mask, (frame.shape[1], frame.shape[0]), interpolation=cv2.INTER_NEAREST)
      roi = np.zeros_like(frame)
      roi[mask_resized == 1] = frame[mask_resized == 1]
      roi_filename = os.path.join(roi_output_dir, f'segmented_human_{ti}.png')
      cv2.imwrite(roi_filename, roi)
      writer.write(frame)
hkchengrex commented 1 year ago

If you are comparing the mask with 1 then you are only extracting the object with id=1. You would probably need a loop to extract them all.