Training in real semi-supervision scenario训练报错

yyyyy-aa commented 4 months ago

我的数据路径如下

按照您readme的指示，我把您的代码进行替换后用于训练真实半监督场景（其中image_root我设置为image_root = './data/LAA'）

报错信息如下：

请问除定义数据部分的代码需要修改外，还有其他需要修改吗

yyyyy-aa commented 4 months ago

请问真实的半监督训练场景下，labeled_root和unlabeled_root下的train_images是相同的文件夹吗？

hsiangyuzhao commented 4 months ago

Hi @yyyyy-aa

Thanks for this issue. I have updated the repo for better comprehension and easier implementation for supporting real semi-supervision scenarios. Please pull the latest repo and follow the updated instructions shown in the README file.

The major changes include:

We add support for a new data pipeline instance that directly supports real SSL scenarios;
We change the storage of labeled and unlabeled training data.

Please follow the latest instructions.

请问真实的半监督训练场景下，labeled_root和unlabeled_root下的train_images是相同的文件夹吗？

No, in labeled_root, you should store labeled training and validation images. While in unlabeled_root, you should store unlabeled training images.

yyyyy-aa commented 4 months ago

Hi @yyyyy-aa

Thanks for this issue. I have updated the repo for better comprehension and easier implementation for supporting real semi-supervision scenarios. Please pull the latest repo and follow the updated instructions shown in the README file.

The major changes include:

We add support for a new data pipeline instance that directly supports real SSL scenarios;

We change the storage of labeled and unlabeled training data.

Please follow the latest instructions.

请问真实的半监督训练场景下，labeled_root和unlabeled_root下的train_images是相同的文件夹吗？

No, in labeled_root, you should store labeled training and validation images. While in unlabeled_root, you should store unlabeled training images.

那请问真实半监督场景下的有标签训练数据和无标签训练数据，也就是labeled_root和unlabeled_root下的train_images，是需要自行划分的吗（类似于将80张有标签的训练图像留下8张或者16张留在labeled_root的train_images下，其余放到unlabeled_root下的train_images里这样子吗）

hsiangyuzhao commented 4 months ago

那请问真实半监督场景下的有标签训练数据和无标签训练数据，也就是labeled_root和unlabeled_root下的train_images，是需要自行划分的吗（类似于将80张有标签的训练图像留下8张或者16张留在labeled_root的train_images下，其余放到unlabeled_root下的train_images里这样子吗）

Imagine you have 20 labeled images with 20 manual annotations, and 60 unlabeled images (with no manual annotations) for training, and 50 labeled images for validation:

Put 20 labeled images to labeled_root/train_images, 20 corresponding masks to labeled_root/train_labels;
Put 50 labeled images to labeled_root/val_images, 50 corresponding masks to labeled_root/val_labels;
Put 60 unlabeled images to unlabeled_root/train_images.

In one word: Put your labeled data to labeled_root, and your unlabeled data to unlabeled_root. I guess it is quite simple with no questions.

yyyyy-aa commented 4 months ago

您好，可能我没有表达清楚，我想问的是：按照半监督中常用的设置（10%有标签，20%有标签），在真实半监督场景的代码里是没办法通过设置ratio=0.1或0.2来划定有标签和无标签训练图像的吗，这个步骤是需要自行随机从有标签文件夹（80张）中裁掉一定数量的图像到无标签文件夹来达到这个划分的目的吗？

hsiangyuzhao commented 4 months ago

您好，可能我没有表达清楚，我想问的是：按照半监督中常用的设置（10%有标签，20%有标签），在真实半监督场景的代码里是没办法通过设置ratio=0.1或0.2来划定有标签和无标签训练图像的吗，这个步骤是需要自行随机从有标签文件夹（80张）中裁掉一定数量的图像到无标签文件夹来达到这个划分的目的吗？

That's correct.

Setting labeled data ratio to 10% or 20% is a common technique in evaluating the effectiveness of semi-supervised learning algorithms. Given a fixed number of available data, a better SSL model should yield better performance when the ratio of labeled data is lower. However, since we need to change the ratio of labeled data, we require that all of the available data are actually labeled so that any data in the dataset could play the role of "labeled data", and we could change the ratio of labeled data freely without modifying the IO code or dataset itself.

The reason that we provide the support for real semi-supervision scenario is that if we have some unlabeled data that could boost segmentation performance potentially, they should be stored separately so that the IO process would "know" which data is labeled and which are not, therefore performing different IO process and preprocessing on these data.

However, it makes no sense if you are setting different ratios of labeled data in our "real semi-supervision scenario" as it is NOT designed to do so. If you want to change the ratio, you would always simply change the RATIO argument in the .cfg files stored in configs folder.

hsiangyuzhao commented 4 months ago

Another personal suggestion is that I would suggest reading related papers, the whole README file, as well as other issues before starting a new issue, and maintaining conciseness and exactness in the issue, which could reduces unnecessary communication and saves time for all of the collaborators in the repo.

Also, please respect other's time by not opening too many issues that are related to debugging.

hsiangyuzhao / RCPS

Training in real semi-supervision scenario训练报错 #18