Closed yyyyy-aa closed 4 months ago
请问真实的半监督训练场景下,labeled_root和unlabeled_root下的train_images是相同的文件夹吗?
Hi @yyyyy-aa
Thanks for this issue. I have updated the repo for better comprehension and easier implementation for supporting real semi-supervision scenarios. Please pull the latest repo and follow the updated instructions shown in the README file.
The major changes include:
Please follow the latest instructions.
请问真实的半监督训练场景下,labeled_root和unlabeled_root下的train_images是相同的文件夹吗?
No, in labeled_root
, you should store labeled training and validation images. While in unlabeled_root
, you should store unlabeled training images.
Hi @yyyyy-aa
Thanks for this issue. I have updated the repo for better comprehension and easier implementation for supporting real semi-supervision scenarios. Please pull the latest repo and follow the updated instructions shown in the README file.
The major changes include:
- We add support for a new data pipeline instance that directly supports real SSL scenarios;
- We change the storage of labeled and unlabeled training data.
Please follow the latest instructions.
请问真实的半监督训练场景下,labeled_root和unlabeled_root下的train_images是相同的文件夹吗?
No, in
labeled_root
, you should store labeled training and validation images. While inunlabeled_root
, you should store unlabeled training images.
那请问真实半监督场景下的有标签训练数据和无标签训练数据,也就是labeled_root和unlabeled_root下的train_images,是需要自行划分的吗(类似于将80张有标签的训练图像留下8张或者16张留在labeled_root的train_images下,其余放到unlabeled_root下的train_images里这样子吗)
那请问真实半监督场景下的有标签训练数据和无标签训练数据,也就是labeled_root和unlabeled_root下的train_images,是需要自行划分的吗(类似于将80张有标签的训练图像留下8张或者16张留在labeled_root的train_images下,其余放到unlabeled_root下的train_images里这样子吗)
Imagine you have 20 labeled images with 20 manual annotations, and 60 unlabeled images (with no manual annotations) for training, and 50 labeled images for validation:
labeled_root/train_images
, 20 corresponding masks to labeled_root/train_labels
;labeled_root/val_images
, 50 corresponding masks to labeled_root/val_labels
;unlabeled_root/train_images
.In one word: Put your labeled data to labeled_root
, and your unlabeled data to unlabeled_root
. I guess it is quite simple with no questions.
您好,可能我没有表达清楚,我想问的是: 按照半监督中常用的设置(10%有标签,20%有标签),在真实半监督场景的代码里是没办法通过设置ratio=0.1或0.2来划定有标签和无标签训练图像的吗,这个步骤是需要自行随机从有标签文件夹(80张)中裁掉一定数量的图像到无标签文件夹来达到这个划分的目的吗?
您好,可能我没有表达清楚,我想问的是: 按照半监督中常用的设置(10%有标签,20%有标签),在真实半监督场景的代码里是没办法通过设置ratio=0.1或0.2来划定有标签和无标签训练图像的吗,这个步骤是需要自行随机从有标签文件夹(80张)中裁掉一定数量的图像到无标签文件夹来达到这个划分的目的吗?
That's correct.
Setting labeled data ratio to 10% or 20% is a common technique in evaluating the effectiveness of semi-supervised learning algorithms. Given a fixed number of available data, a better SSL model should yield better performance when the ratio of labeled data is lower. However, since we need to change the ratio of labeled data, we require that all of the available data are actually labeled so that any data in the dataset could play the role of "labeled data", and we could change the ratio of labeled data freely without modifying the IO code or dataset itself.
The reason that we provide the support for real semi-supervision scenario is that if we have some unlabeled data that could boost segmentation performance potentially, they should be stored separately so that the IO process would "know" which data is labeled and which are not, therefore performing different IO process and preprocessing on these data.
However, it makes no sense if you are setting different ratios of labeled data in our "real semi-supervision scenario" as it is NOT designed to do so. If you want to change the ratio, you would always simply change the RATIO
argument in the .cfg
files stored in configs
folder.
Another personal suggestion is that I would suggest reading related papers, the whole README file, as well as other issues before starting a new issue, and maintaining conciseness and exactness in the issue, which could reduces unnecessary communication and saves time for all of the collaborators in the repo.
Also, please respect other's time by not opening too many issues that are related to debugging.
我的数据路径如下
按照您readme的指示,我把您的代码进行替换后用于训练真实半监督场景(其中image_root我设置为image_root = './data/LAA')
报错信息如下:
请问除定义数据部分的代码需要修改外,还有其他需要修改吗