Dose the test reuslts on several images rather than whole videos represent the performance of video semantic segmentation methods?

jfzhuang / IFR

[CVPR'22] Semi-Supervised Video Semantic Segmentation with Inter-Frame Feature Reconstruction

MIT License

27 stars 4 forks source link

Dose the test reuslts on several images rather than whole videos represent the performance of video semantic segmentation methods? #2

Closed imzhangyd closed 1 year ago

imzhangyd commented 1 year ago

Here is another problem I'm confusing. The task of video semantic segmentation is to segment each frame of videos. But only several frames are labeled in the test set, the test performance in experiments is on several images rather than whole videos. I think it can not represent the performance of video semantic segmentation methods. Did I misunderstand something here?

jfzhuang commented 1 year ago

It is a compromise strategy to evaluete models on sampled key frames with labels because existing datasets can not provide per-frame labels for evaluation due to their high cost. And you are correct. It is more reasonable to evaluate on each frame if per-frame labels are given. Besides, some recent works proposed to evaluate temporal consistency (TC), which can represent the consistency of segmentation results. In our paper, we provide TC scores in Table 9.

imzhangyd commented 1 year ago

Thanks for your patience. I was wondering whether the main difference between video semantic segmentation and semi-supervised video semantic segmentation is whether training with unlabeled video.

jfzhuang commented 1 year ago

Yes, you are correct.

imzhangyd commented 1 year ago

Thank you very much! Some VSS methods aggregate features of neighborhood unlabeled frames to segment the current frame, so I think these methods also use unlabeled frames for training and they can be considered semi-supervised. Did I misunderstand something here?