pascalcpp / SDCL

SDCL: Students Discrepancy-Informed Correction Learning for Semi-supervised Medical Image Segmentation
17 stars 3 forks source link

Unlabeled data #1

Open carlotita22 opened 1 month ago

carlotita22 commented 1 month ago

Hi!

Thanks for your work! I have a question about the unlabeled data because when you define in the dataloader (dataset) you need all labels for the data: def getitem(self, idx): image_name = self.image_list[idx] h5f = h5py.File(self._base_dir + "/2018LA_Seg_Training Set/" + image_name + "/mri_norm2.h5", 'r')

h5f = h5py.File(self._base_dir+"/"+image_name+"/mri_norm2.h5", 'r')

    image = h5f['image'][:]
    label = h5f['label'][:]
    sample = {'image': image, 'label': label}
    if self.transform:
        sample = self.transform(sample)

But, i dont have the label of the data unlabeled, what can I do?

pascalcpp commented 1 month ago

Because the actual unlabeled data's label is not used in practice, you can create a random fake label that corresponds to the shape to solve this problem.

carlotita22 commented 1 month ago

Thank you very much for your response. I have a question regarding the test data. Do you also use that data for validation? During training, you calculate the Dice score using the test data, so I'm not sure if it's used or not in the training process.

pascalcpp commented 1 month ago

The supervised signals for unlabeled data are all from with pseudo label.

carlotita22 commented 1 month ago

Hiiii, My question is general, since the dataset is divided into test, train_labels, and train_unlabels, and I see that the test data is used to keep track of the best DICE score, which would mean it is "validating the training." So, I believe that this data cannot be used for inference because it is being used as validation during training. In fact, I did the exercise of inferring with the test data, and the results were quite good (approx. DICE = 0.84 - myocardial segmentation using 8 labels and 46 unlabels), but when I use new subjects as an example (not used as validation during training), the inference results are lower. Does that make sense? I also want to thank you for your GitHub, it's great! And please, if I am mistaken, I would appreciate your explanation :)!