Closed Alexanderisgod closed 1 year ago
unless the motion supervision comes from optical flow. However, it seems not.
Hi Alexanderisgod,
Thanks for your interest. The main motivation of this paper is indeed to guide the slot masks with motion cues (motion segments).
In both our paper and the most recent update, we use both ground-truth segmentation masks and estimated motion segments (which are conducted on the self-supervised flow and a pre-trained model on the toy flyingthings dataset).
We agree that even for the lightweight estimated motion, we still need some minimal supervision from the toy dataset, but that model did not get access to any ground truth of the target dataset. On the KITTI dataset, we did not use any ground-truth segments and can achieve a reasonable result.
Thanks a lot for pointing out the shortcoming of our model and we are actually aiming to further reduce the supervision until achieving the full self-supervised object discovery results.
Thanks, Zhipeng
In the paper, you used the segmentation masks to supervise, so why it is called unsupervised method??