About real semi-supervised scene

ghost commented 1 year ago

Hi, thank you for the README update and congratulations to the acceptance!

What is the different between real semi-supervised scene and changing label ratios? To make sure, when changing label ratios, the model didn't use the label to compute segmentation loss, isn't this the same as real semi-supervised scene?

A follow up question: Will there be precision change (drop) when switching to real semi-supervised scene compared to the results reported on the paper?

Many Thanks.

hsiangyuzhao commented 1 year ago

Hi @YuxuanWen-Code ,

Thanks for your interest in our research. Actually, the difference between real semi-supervised scenarios and changing label ratios is that all of the training data in the later case is labeled (but some of them are treated as unlabeled data, where their labels are ignored and never used for training).

The reason that we split these two scenarios is that, when building up datasets using our TrainValDataPipeline class, the input images should be either all labeled or all unlabeled (this is determined by whether you pass mode=labeled or mode=unlabeled). If your data is partly labeled (real semi-supervised scenario), you cannot pass mode=labeled as in this case the data pipeline will require every data to be labeled, and raise errors when labels are absent. For simplicity (without modifying the IO code), discussing these two cases separately is an easier idea with less cost.

To be brief, these two cases are split just for simplicity, and serve different purposes.

In the latter case (changing ratios), you could test the effectiveness of the semi-supervised algorithm and check its performance when changing the ratio of unlabeled data.
In the former case (real semi-supervised scenario), you could use the algorithm to enhance the segmentation performance compared with using merely labeled data.

But these two scenarios do NOT alter model effectiveness. Imagine these two cases:

100 labeled images (labeled ratio is set to 10%)
10 labeled images with 90 unlabeled ones

The model should yield very close performance in these two scenarios, as essentially they are identical. But you should treat these two cases differently as our code is designed to do so :)

ghost commented 1 year ago

Thank you for such detailed and timely explanation, it makes perfect sense.

*I will mark this issue close now since my confusion is resolved. Best regards, and thank you again for the patient and effort.

hsiangyuzhao / RCPS

About real semi-supervised scene #11