mhamilton723 / STEGO

Unsupervised Semantic Segmentation by Distilling Feature Correspondences
MIT License
711 stars 142 forks source link

Demo predict the video or capture #70

Closed KennyChen880127 closed 11 months ago

KennyChen880127 commented 1 year ago

Hello everyone! Does anyone know how to predict the video(mp4) or capture? This issuie is very important to me..

I adjust this code https://github.com/mhamilton723/STEGO/blob/master/src/STEGO_Colab_Demo.ipynb

My idea is use torchvision.io.read_video(video) to get the tuple type and then convert to tensor. But it's not work. Hope someone can help me!

mhamilton723 commented 1 year ago

@KennyChen880127 yes that would indeed be the first step.

In particular"

  1. Load the video with torch-vision
  2. Transform each video frame
  3. Apply STEGO to each video frame
  4. Apply the plotting code to each frame, use the matplotlib video maker i have in https://github.com/mhamilton723/STEGO/blob/master/src/plot_dino_correspondence.py

For guidance

If you make a nice video tool, happy to accept in a PR

KennyChen880127 commented 1 year ago

@mhamilton723 Thank you for your reply! I will refer to your suggestion!

KennyChen880127 commented 1 year ago

@KennyChen880127 yes that would indeed be the first step.

In particular"

  1. Load the video with torch-vision
  2. Transform each video frame
  3. Apply STEGO to each video frame
  4. Apply the plotting code to each frame, use the matplotlib video maker i have in https://github.com/mhamilton723/STEGO/blob/master/src/plot_dino_correspondence.py

For guidance

If you make a nice video tool, happy to accept in a PR

I'm sorry to bother you again sir,I'm follow your step and succeeded in predict the video. But I founded the fps is very low, I guess beacuse when predicting the every frame will use LitUnsupervisedSegmenter again? Would you teach the easy way to predict? I very need this code...