Why didn't you extract features for every 2d input image??? There is no for loop.
I saw your code, and I'm sure you draw a feature on the 2d image and do 2d segmentation through the encoder decoder. How does it become a video segmentation with 3D? 2d and just stack? Is the T you set the number of objects?
Why didn't you extract features for every 2d input image??? There is no for loop.
I saw your code, and I'm sure you draw a feature on the 2d image and do 2d segmentation through the encoder decoder. How does it become a video segmentation with 3D? 2d and just stack? Is the T you set the number of objects?