zhixuhao / unet

unet for image segmentation
MIT License
4.55k stars 1.99k forks source link

adjustData function - purpose of reshaping for multi-class prediction #86

Open kl-31 opened 5 years ago

kl-31 commented 5 years ago

I'm trying to understand the adjustData function in the data.py object, particularly for multi-class prediction. The conversion to one-hot makes sense. However, I don't get the purpose of the following reshape code: new_mask = np.reshape(new_mask,(new_mask.shape[0],new_mask.shape[1]*new_mask.shape[2],new_mask.shape[3])) if flag_multi_class else np.reshape(new_mask,(new_mask.shape[0]*new_mask.shape[1],new_mask.shape[2]))

Shouldn't the inputs to the model be retained as 2-D images? Why is the image new_mask.shape[1],new_mask.shape[2] being reshaped into a vector? Also, why is this applied only to mask and not to img? This reshaping isn't done for binary classification, so I'm wondering what's different about the multi-class case.

Incidentally, shouldn't the model loss defined under model.py be categorical cross-entropy, in the case of multi-class prediction? Perhaps the code isn't quite finished for multi-class usage. Anybody try running this for multi-class? Thanks in advance for any thoughts/help/clarification.

JoeyJoeJoeJunior commented 5 years ago

Hi I am trying the same - changing the code to segment images with 5 classes. As far as I understand you are right about the lossfunction, it should be 'categorical_crossentropy'. My problem is this error message: ValueError: Error when checking target: expected conv2d_24 to have 4 dimensions, but got array with shape (2, 65536, 5) It seems I got the masking wrong somehow. Did you achieve to fix your problem with the semantic segmentation of multiple classes?

kl-31 commented 5 years ago

You can try the author's other repo unet-multi. I asked a similar question there. I wonder if there is something about the fit generator that only takes 2-D mask inputs. I haven't tried to run a multi-class prediction.

JianyingLi commented 5 years ago

Yeah, I think this code can not work for multi-class case. It looks like the shape of ground truth is (batch_size, n_row*n_col, num_class), but the shape of model output (conv_10) is (batch_size, n_row, n_col). Howerver,they should be consistent, for they are y and y hat.

ynwh commented 5 years ago

Same question. And is there other sophisticated U-net keras codes available? Many thanks for answer and help.

mazatov commented 5 years ago

Anybody had luck with this? I have some solution but not sure if it'll work.

I also have been trying to work this out as I have 20 classes! The reshaping makes no sense to me. But even if I comment it out, I still get errors. Seems like something needs to be changed about the way Unet model is set up. The last layer of the model has only 1 channel and should have the shape that maches our mask, which is (768,768,20) after adjust_data in my case. num_classes = 20. 768 is the size of the image.

I changed the ending of the Unet function to the following. I change conv9 and conv10 to 20 channels to get the right amount in the end. I also changed sigmoid to softmax. I remember softmax works better with multiclass?(not sure). I also changed it to categorical_crossentropy as was suggested earlier here. This way at least I don't get errors. I'm trying to train now. Will report back on how it worked.

  conv9 = Conv2D(20, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
    conv10 = Conv2D(20, 1, activation = 'softmax')(conv9)
    model = Model(input = inputs, output = conv10)
    model.compile(optimizer = Adam(lr = 1e-4), loss = 'categorical_crossentropy', metrics = ['accuracy'])

I'm trying to understand the adjustData function in the data.py object, particularly for multi-class prediction. The conversion to one-hot makes sense. However, I don't get the purpose of the following reshape code: new_mask = np.reshape(new_mask,(new_mask.shape[0],new_mask.shape[1]*new_mask.shape[2],new_mask.shape[3])) if flag_multi_class else np.reshape(new_mask,(new_mask.shape[0]*new_mask.shape[1],new_mask.shape[2]))

Shouldn't the inputs to the model be retained as 2-D images? Why is the image new_mask.shape[1],new_mask.shape[2] being reshaped into a vector? Also, why is this applied only to mask and not to img? This reshaping isn't done for binary classification, so I'm wondering what's different about the multi-class case.

Incidentally, shouldn't the model loss defined under model.py be categorical cross-entropy, in the case of multi-class prediction? Perhaps the code isn't quite finished for multi-class usage. Anybody try running this for multi-class? Thanks in advance for any thoughts/help/clarification.

mazatov commented 5 years ago

Reporting back. It trained no problem seemingly had a low loss, but the outputs on test images were complete nonsense. All pixels in the image had label 1. Seems like it is not training the data right...

mazatov commented 5 years ago

Finally got it to work. Here are the changes! The issue with more than one class is that you need to compare it with categorical cross-entropy. But in Keras categorical cross-entropy can only compare vectors that have been one-hot encoded. So that's why in the end of Unet I convert the final layer to the vector shape. Similarly, in adjust_data you need to convert the label to the one-hot vector shape so that categorical cross-entropy can do its job.

FYI the performance of UNet does significantly decrease with new labels. I saw noticable decreases with num_class > 5 labels, but it all depends on the problem. My objects are pretty similar and it might be causing the issue.

This is the modified Unet function:

def unet(pretrained_weights = None,input_size = (256,256,1), num_class = 2):
    inputs = Input(input_size)
    conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(inputs)
    conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
    conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
    conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
    conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
    conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
    conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
    conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
    drop4 = Dropout(0.5)(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2))(drop4)

    conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
    conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
    drop5 = Dropout(0.5)(conv5)

    up6 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(drop5))
    merge6 = concatenate([drop4,up6], axis = 3)
    conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
    conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)

    up7 = Conv2D(256, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
    merge7 = concatenate([conv3,up7], axis = 3)
    conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
    conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)

    up8 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
    merge8 = concatenate([conv2,up8], axis = 3)
    conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
    conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)

    up9 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
    merge9 = concatenate([conv1,up9], axis = 3)
    conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
    conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
    conv9 = Conv2D(num_class, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
    conv10 = Conv2D(num_class, 1, activation = 'softmax')(conv9)
    conv11 = Reshape([input_size[0]*input_size[1],num_class])(conv10)

    model = Model(input = inputs, output = conv11)

    model.compile(optimizer = Adam(lr = 2e-5), loss = 'categorical_crossentropy', metrics = ['accuracy'])
    #model.compile(optimizer = Adam(lr = 1e-4), loss = 'binary_crossentropy', metrics = ['accuracy'])

    model.summary()

    if(pretrained_weights):
        model.load_weights(pretrained_weights)

    return model

In data. py add

from keras.utils import to_categorical

and change part of the adjust_data function to the following

   if(flag_multi_class):
        img = img / 255
        mask = mask[:,:,:,0] if(len(mask.shape) == 4) else mask[:,:,0]    
        new_mask=np.reshape(mask,[mask.shape[0],mask.shape[1]*mask.shape[2]])
        new_mask = to_categorical(new_mask, num_classes=num_class, dtype='uint8')
        mask = new_mask
Ekinkit commented 5 years ago

@mazatov Thank you so much for your idea. The problem for me now is that how to predict the RGB images with the model I have trained? I used the code provided by the author. However, the output image is like a straight line. 1552901130(1)

Can anyone give me some ideas to deal with that?

zhouhao-learning commented 5 years ago

@mazatov

if(flag_multi_class):
       img = img / 255
       mask = mask[:,:,:,0] if(len(mask.shape) == 4) else mask[:,:,0]    
       new_mask=np.reshape(mask,[mask.shape[0],mask.shape[1]*mask.shape[2]])
       new_mask = to_categorical(new_mask, num_classes=num_class, dtype='uint8')
       mask = new_mask

Hello, I used the code above to process my mask image, but I got the following error:

IndexError                                Traceback (most recent call last)
<ipython-input-38-d7a7b83ff0f6> in <module>
      2 mask = img_to_array(mask)
      3 mask = mask[:,:,:,0] if(len(mask.shape) == 4) else mask[:,:,0]
----> 4 new_mask=np.reshape(mask,[mask.shape[0],mask.shape[1]*mask.shape[2]])
      5 new_mask = to_categorical(new_mask, num_classes=2, dtype='uint8')
      6 mask = new_mask

IndexError: tuple index out of range

My mask image is 3 channels. What's the matter?

now I have a piece of ultrasound data, a total of 2 categories, one is benign data, the other is malignant data, they all have corresponding mask map labels, I want to achieve multi-class detection through Unet, but I don't Knowing how to modify the model and how to process my data into multiple categories, I don't understand how my label should be defined. Can you teach me more? Thank you!

mazatov commented 5 years ago

Looks like your mask is a tuple. In the code above, mask is a 4d array.

FrancisDacian commented 5 years ago

@mazatov

if(flag_multi_class):
       img = img / 255
       mask = mask[:,:,:,0] if(len(mask.shape) == 4) else mask[:,:,0]    
       new_mask=np.reshape(mask,[mask.shape[0],mask.shape[1]*mask.shape[2]])
       new_mask = to_categorical(new_mask, num_classes=num_class, dtype='uint8')
       mask = new_mask

Hello, I used the code above to process my mask image, but I got the following error:

IndexError                                Traceback (most recent call last)
<ipython-input-38-d7a7b83ff0f6> in <module>
      2 mask = img_to_array(mask)
      3 mask = mask[:,:,:,0] if(len(mask.shape) == 4) else mask[:,:,0]
----> 4 new_mask=np.reshape(mask,[mask.shape[0],mask.shape[1]*mask.shape[2]])
      5 new_mask = to_categorical(new_mask, num_classes=2, dtype='uint8')
      6 mask = new_mask

IndexError: tuple index out of range

My mask image is 3 channels. What's the matter?

now I have a piece of ultrasound data, a total of 2 categories, one is benign data, the other is malignant data, they all have corresponding mask map labels, I want to achieve multi-class detection through Unet, but I don't Knowing how to modify the model and how to process my data into multiple categories, I don't understand how my label should be defined. Can you teach me more? Thank you!

I have the same problem.

vijay2411 commented 3 years ago

Finally got it to work. Here are the changes! The issue with more than one class is that you need to compare it with categorical cross-entropy. But in Keras categorical cross-entropy can only compare vectors that have been one-hot encoded. So that's why in the end of Unet I convert the final layer to the vector shape. Similarly, in adjust_data you need to convert the label to the one-hot vector shape so that categorical cross-entropy can do its job.

FYI the performance of UNet does significantly decrease with new labels. I saw noticable decreases with num_class > 5 labels, but it all depends on the problem. My objects are pretty similar and it might be causing the issue.

This is the modified Unet function:

def unet(pretrained_weights = None,input_size = (256,256,1), num_class = 2):
    inputs = Input(input_size)
    conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(inputs)
    conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
    conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
    conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
    conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
    conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
    conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
    conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
    drop4 = Dropout(0.5)(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2))(drop4)

    conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
    conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
    drop5 = Dropout(0.5)(conv5)

    up6 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(drop5))
    merge6 = concatenate([drop4,up6], axis = 3)
    conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
    conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)

    up7 = Conv2D(256, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
    merge7 = concatenate([conv3,up7], axis = 3)
    conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
    conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)

    up8 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
    merge8 = concatenate([conv2,up8], axis = 3)
    conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
    conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)

    up9 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
    merge9 = concatenate([conv1,up9], axis = 3)
    conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
    conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
    conv9 = Conv2D(num_class, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
    conv10 = Conv2D(num_class, 1, activation = 'softmax')(conv9)
    conv11 = Reshape([input_size[0]*input_size[1],num_class])(conv10)

    model = Model(input = inputs, output = conv11)

    model.compile(optimizer = Adam(lr = 2e-5), loss = 'categorical_crossentropy', metrics = ['accuracy'])
    #model.compile(optimizer = Adam(lr = 1e-4), loss = 'binary_crossentropy', metrics = ['accuracy'])

    model.summary()

    if(pretrained_weights):
      model.load_weights(pretrained_weights)

    return model

In data. py add

from keras.utils import to_categorical

and change part of the adjust_data function to the following

  if(flag_multi_class):
       img = img / 255
       mask = mask[:,:,:,0] if(len(mask.shape) == 4) else mask[:,:,0]    
       new_mask=np.reshape(mask,[mask.shape[0],mask.shape[1]*mask.shape[2]])
       new_mask = to_categorical(new_mask, num_classes=num_class, dtype='uint8')
       mask = new_mask

Can somebody please explain what are the three dimensions of mask in this case? I just have a 2D mask with num class as the value. Is mask.shape[0] the batch size? I believe we are running it for one image and mask only at a time. Please correct me if I am wrong. Thanks a lot.

jcarta commented 3 years ago

Finally got it to work. Here are the changes! The issue with more than one class is that you need to compare it with categorical cross-entropy. But in Keras categorical cross-entropy can only compare vectors that have been one-hot encoded. So that's why in the end of Unet I convert the final layer to the vector shape. Similarly, in adjust_data you need to convert the label to the one-hot vector shape so that categorical cross-entropy can do its job.

FYI the performance of UNet does significantly decrease with new labels. I saw noticable decreases with num_class > 5 labels, but it all depends on the problem. My objects are pretty similar and it might be causing the issue.

This is the modified Unet function:

def unet(pretrained_weights = None,input_size = (256,256,1), num_class = 2):
    inputs = Input(input_size)
    conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(inputs)
    conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
    conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
    conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
    conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
    conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
    conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
    conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
    drop4 = Dropout(0.5)(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2))(drop4)

    conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
    conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
    drop5 = Dropout(0.5)(conv5)

    up6 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(drop5))
    merge6 = concatenate([drop4,up6], axis = 3)
    conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
    conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)

    up7 = Conv2D(256, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
    merge7 = concatenate([conv3,up7], axis = 3)
    conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
    conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)

    up8 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
    merge8 = concatenate([conv2,up8], axis = 3)
    conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
    conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)

    up9 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
    merge9 = concatenate([conv1,up9], axis = 3)
    conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
    conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
    conv9 = Conv2D(num_class, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
    conv10 = Conv2D(num_class, 1, activation = 'softmax')(conv9)
    conv11 = Reshape([input_size[0]*input_size[1],num_class])(conv10)

    model = Model(input = inputs, output = conv11)

    model.compile(optimizer = Adam(lr = 2e-5), loss = 'categorical_crossentropy', metrics = ['accuracy'])
    #model.compile(optimizer = Adam(lr = 1e-4), loss = 'binary_crossentropy', metrics = ['accuracy'])

    model.summary()

    if(pretrained_weights):
      model.load_weights(pretrained_weights)

    return model

In data. py add

from keras.utils import to_categorical

and change part of the adjust_data function to the following

  if(flag_multi_class):
       img = img / 255
       mask = mask[:,:,:,0] if(len(mask.shape) == 4) else mask[:,:,0]    
       new_mask=np.reshape(mask,[mask.shape[0],mask.shape[1]*mask.shape[2]])
       new_mask = to_categorical(new_mask, num_classes=num_class, dtype='uint8')
       mask = new_mask

Is the reshape to (1, row*col, num_classes) necessary? Can I input (1, row, col, num_classes) into the model instead?