Can metaseg input a video and output the class label?

kadirnar / segment-anything-video

MetaSeg: Packaged version of the Segment Anything repository

Apache License 2.0

952 stars 67 forks source link

Can metaseg input a video and output the class label? #91

Open CR400AF-A opened 1 year ago

CR400AF-A commented 1 year ago

Thanks for your great work!

I have a specific requirement for my project and I'm wondering if metaseg can cater to it. I need to input an image with dimensions HW3 (height width 3 channels) and obtain an "image" output with class labels in the form of HW1 (height width 1 channel). The "1" in this context represents that the pixels belong to different classes, rather than representing exact semantic labels.

Before I proceed, I'd like to confirm if metaseg has the capability to handle such a task. Your response would be highly valuable to me. Thank you for your time, and I'm looking forward to hearing from you.

CR400AF-A commented 1 year ago

I found the solution, but a new problem has emerged.

What I want to do is to segment a video and label each class. My first idea is to assign different class labels to different mask_image colors (you can see what I did for this below). However, I noticed that the output mask video changes the colors between different frames, making it difficult for me to track the labels (such as cookie/person and so on). I checked your code and found that you did the same thing to the video as the images. So, it is not surprising to get such a result.

Therefore, I wonder if you could share some of your ideas regarding this. Thanks!

What I did (In sam_predictor.py line 139): ''' combined_mask = mask_image # combined_mask = cv2.add(frame, mask_image) out.write(combined_mask) '''

CR400AF-A commented 1 year ago

maybe this video can help you understand what happened. Take the person's arm as an example. I want to give these pixels a label according to something (here is the mask color, but the color changes with time). So is there some methods to fix it? Thanks!

https://github.com/kadirnar/segment-anything-video/assets/104341742/6011b64c-c376-41bd-b8d4-8537c1a3fb30

CR400AF-A commented 1 year ago

The video is too large (46M) to preview on the github. Here is an link: https://cloud.tsinghua.edu.cn/d/fefe751e32d549ad8aab/

Snnier commented 1 year ago

How did you do it: "1" means the pixel belongs to a different class, not the exact semantic label？

CR400AF-A commented 1 year ago

Hello, I can't make it through this method. Maybe you can have a look at issue #92 . I provide some methods for this issue.