jeya-maria-jose / Medical-Transformer

Official Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation" - MICCAI 2021
MIT License
799 stars 176 forks source link

Multi-label Segmentation #51

Open shanpriya3 opened 2 years ago

shanpriya3 commented 2 years ago

Hi, I have a ground-truth with 3 classes including background with values 0,127,255. As mentioned in #43 I changed num_classes=3 in axialnet.py In utils.py, https://github.com/jeya-maria-jose/Medical-Transformer/blob/703a080d66d16673be1b8770bca143956c9f0e8a/utils.py#L156-L157

which makes the ground-truth with values 0 and 1 but I should have 0,1 and 2 for my case(with 3 classes).

I tried doing this mask[mask<127] = 0 mask[mask==127] = 1 mask[mask>127] = 2

But I got this error. Could you please help me with this? image

jeya-maria-jose commented 2 years ago

You should remove those lines with mask if you are converting it to multi-class problem. Your ground truth should just contain pixels of values 0,1,2,3 if you are working on a 3-class classification problem.

shanpriya3 commented 2 years ago

Hi, Thanks for your response. I did remove those 2 lines from my code. I have 3 classes/labels in total(including background) which have the values 0,1,2 in my ground-truth respectively. I also changed num_classes=3 in axialnet.py but when I run the code, I get this error. Does it have to do with the loss function? Do I need to change anything else? Could you please help me with this error? image

shanpriya3 commented 2 years ago

Could you please explain what does these lines(189-192) do in train.py?

tmp[tmp>=0.5] = 1 tmp[tmp<0.5] = 0 tmp2[tmp2>0] = 1 tmp2[tmp2<=0] = 0

and also why you do this(205-206)? yHaT[yHaT==1] =255 yval[yval==1] =255

I have to remove these lines for my case, right?

Qiang19990514 commented 2 years ago

请问这个你是用的哪个数据集

rw404 commented 1 year ago

@shanpriya3, the code in lines 189-192 applies an aggressive softmax - i.e. translate all predictions into binary format (either 0 or 1) to then store the mask in the format described in the repository's Readme (values 255 correspond to the object, 0 to the background).

Lines 205-206 are needed for the mask saving format described in the repository:

  1. Based on the image, the model builds a response map: y_out = model(X_batch) on line 184;
  2. The image is converted to numpy format, then it is assumed that the output of the model contains a probability map of whether a pixel belongs to objects, i.e. [batch_size, channels, width, height] are translated into [batch_size, num_classes, width, height] (in this case, num_classes = 3), and each position of the result contains such a number from 0 to 1 that if you add by the number of classes (dim=1) result, then you get a map (batch maps) of identical units ([batch_size, num_classes, width, height].sum(dim=1) == 1*[batch_size, width, height] - the description is formal, just to add interpretability), BUT:
    • criterion = LogNLLLoss() is used as a criterion - line 111, however, this criterion is described in the metrics.py file on the 9th line and implements not LogNLLLoss, but CrossEntropy, that is, for predicting the model model(input) in the criterion object, softmax is applied first, so there is no used (in _forward_impl, forward methods).
    • Then the result in the validation part before calling tmp[tmp>=0.5] = 1 in line 189, you need to call Softmax-transformation in order to interpret the model prediction (raw data) as probabilities i.e. replace y_out = model(X_batch) in line 184 with, for example, y_out = model.soft(model(X_batch)) or y_out = torch.nn.functional.softmax(model(X_batch), dim=1).

Then, as @jeya-maria-jose mentioned, instead of modifying the mask, you need to remove these lines and assume that gt(ground truth) should contain integer values of object classes (0, 1 or 2 in this case), also for a simpler interpretation, it is easier to save the predictions of the validation set not only for the 1st channel, i.e. maybe change line 214 to cv2.imwrite(fulldir+image_filename, yHaT[0,1:,:,:].transpose(1, 2, 0)) with optional zero-padding or keeping the background layer to avoid errors saving dual channel images. The resulting mask will have num_classes-1 (no background) layers, and each layer will contain 255 only if the corresponding object is detected by the model in this pixel (for example, in the first layer in the $(i, j)$ position there will be 255, which means in $(i, j)$ is an object of the 1st class, and if $(i, j)$ contains 255 in the second layer, then the object of the 2nd class is in this position).

twofeetcat commented 1 year ago

Hello, I have a question, in training,output = model(X_batch)contains a probability map of whether a pixel belongs to objects.Does that mean that the values in tensor are numbers from 0 to 1? Secondly, in the training phase, do I need to process y_batch(in this case, num_classes = 20, pixels of values 0,1,2,3... 19), or directly calculate the loss of it and outputloss = criterion(output, y_batch)