mgendarme commented 6 years ago

Dear all,

I am currently working on a semantic segmentation project. I am using U-Net as my encoder-decoder of choice, which looks like this: > summary(model)

Layer (type) Output Shape Param # Connected to

input_2 (InputLayer) (None, 512, 512, 1) 0

conv2d_16 (Conv2D) (None, 512, 512, 16) 160 input_2[0][0]

activation_15 (Activation) (None, 512, 512, 16) 0 conv2d_16[0][0]

spatial_dropout2d_8 (SpatialDropou (None, 512, 512, 16) 0 activation_15[0][0]

conv2d_17 (Conv2D) (None, 512, 512, 16) 2320 spatial_dropout2d_8[0][0]

activation_16 (Activation) (None, 512, 512, 16) 0 conv2d_17[0][0]

max_pooling2d_4 (MaxPooling2D) (None, 256, 256, 16) 0 activation_16[0][0]

conv2d_18 (Conv2D) (None, 256, 256, 32) 4640 max_pooling2d_4[0][0]

activation_17 (Activation) (None, 256, 256, 32) 0 conv2d_18[0][0]

spatial_dropout2d_9 (SpatialDropou (None, 256, 256, 32) 0 activation_17[0][0]

conv2d_19 (Conv2D) (None, 256, 256, 32) 9248 spatial_dropout2d_9[0][0]

activation_18 (Activation) (None, 256, 256, 32) 0 conv2d_19[0][0]

max_pooling2d_5 (MaxPooling2D) (None, 128, 128, 32) 0 activation_18[0][0]

conv2d_20 (Conv2D) (None, 128, 128, 64) 18496 max_pooling2d_5[0][0]

activation_19 (Activation) (None, 128, 128, 64) 0 conv2d_20[0][0]

spatial_dropout2d_10 (SpatialDropo (None, 128, 128, 64) 0 activation_19[0][0]

conv2d_21 (Conv2D) (None, 128, 128, 64) 36928 spatial_dropout2d_10[0][0]

activation_20 (Activation) (None, 128, 128, 64) 0 conv2d_21[0][0]

max_pooling2d_6 (MaxPooling2D) (None, 64, 64, 64) 0 activation_20[0][0]

conv2d_22 (Conv2D) (None, 64, 64, 128) 73856 max_pooling2d_6[0][0]

activation_21 (Activation) (None, 64, 64, 128) 0 conv2d_22[0][0]

spatial_dropout2d_11 (SpatialDropo (None, 64, 64, 128) 0 activation_21[0][0]

conv2d_23 (Conv2D) (None, 64, 64, 128) 147584 spatial_dropout2d_11[0][0]

activation_22 (Activation) (None, 64, 64, 128) 0 conv2d_23[0][0]

conv2d_transpose_4 (Conv2DTranspos (None, 128, 128, 64) 32832 activation_22[0][0]

concatenate_4 (Concatenate) (None, 128, 128, 128) 0 conv2d_transpose_4[0][0]
activation_20[0][0]

conv2d_24 (Conv2D) (None, 128, 128, 64) 73792 concatenate_4[0][0]

activation_23 (Activation) (None, 128, 128, 64) 0 conv2d_24[0][0]

spatial_dropout2d_12 (SpatialDropo (None, 128, 128, 64) 0 activation_23[0][0]

conv2d_25 (Conv2D) (None, 128, 128, 64) 36928 spatial_dropout2d_12[0][0]

activation_24 (Activation) (None, 128, 128, 64) 0 conv2d_25[0][0]

conv2d_transpose_5 (Conv2DTranspos (None, 256, 256, 32) 8224 activation_24[0][0]

concatenate_5 (Concatenate) (None, 256, 256, 64) 0 conv2d_transpose_5[0][0]
activation_18[0][0]

conv2d_26 (Conv2D) (None, 256, 256, 32) 18464 concatenate_5[0][0]

activation_25 (Activation) (None, 256, 256, 32) 0 conv2d_26[0][0]

spatial_dropout2d_13 (SpatialDropo (None, 256, 256, 32) 0 activation_25[0][0]

conv2d_27 (Conv2D) (None, 256, 256, 32) 9248 spatial_dropout2d_13[0][0]

activation_26 (Activation) (None, 256, 256, 32) 0 conv2d_27[0][0]

conv2d_transpose_6 (Conv2DTranspos (None, 512, 512, 16) 2064 activation_26[0][0]

concatenate_6 (Concatenate) (None, 512, 512, 32) 0 conv2d_transpose_6[0][0]
activation_16[0][0]

conv2d_28 (Conv2D) (None, 512, 512, 16) 4624 concatenate_6[0][0]

activation_27 (Activation) (None, 512, 512, 16) 0 conv2d_28[0][0]

spatial_dropout2d_14 (SpatialDropo (None, 512, 512, 16) 0 activation_27[0][0]

conv2d_29 (Conv2D) (None, 512, 512, 16) 2320 spatial_dropout2d_14[0][0]

activation_28 (Activation) (None, 512, 512, 16) 0 conv2d_29[0][0]

conv2d_30 (Conv2D) (None, 512, 512, 1) 17 activation_28[0][0]

Total params: 481,745 Trainable params: 481,745 Non-trainable params: 0

I am willing to segment nuclei from human cells in microscopy images. I managed to get a good segmentation of the objects in the foreground as long as they are not too close. If the individual objects are touching each other my U-Net merges them during the training step. To solve this problem I would like to train on two classes: class one = the objects class two = the border of the objects only (class three = background in my case) This solution has been used by the winners of the Data Science Bowl 2018.

My input image for training looks like: > str(X) num [1:48, 1:512, 1:512, 1] 0.0412 0.0315 0.0322 0.0549 0.0325 ...

My input mask encoding the different class looks like: > str(Y) num [1:48, 1:512, 1:512, 1:3] 0 1 1 1 0 1 1 0 1 1 ...

The loss function that I am using is the following: `dice_coef <- function(y_true, y_pred, smooth = 1.0) { y_true_f <- k_flatten(y_true) y_pred_f <- k_flatten(y_pred) intersection <- k_sum(y_true_f y_pred_f) (2 intersection + smooth) / (k_sum(y_true_f) + k_sum(y_pred_f) + smooth) }

dice_coef_loss <- function(y_true, y_pred) -dice_coef(y_true, y_pred)`

When I am trying to train the model with three classes: model <- model %>% compile(loss = dice_coef_loss, optimizer = 'adam', metrics = c(dice_coef))

I am getting this error: Train on 38 samples, validate on 10 samples Epoch 1/30 Error in py_call_impl(callable, dots$args, dots$keywords) : InvalidArgumentError: Incompatible shapes: [12582912] vs. [4194304] [[Node: metrics_1/dice_coef/Mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](metrics_1/dice_coef/Reshape, metrics_1/dice_coef/Reshape_1)]]

Detailed traceback: File "/home/gendarme/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/keras/engine/training.py", line 1042, in fit validation_steps=validation_steps) File "/home/gendarme/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop outs = f(ins_batch) File "/home/gendarme/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2661, in call return self._call(inputs) File "/home/gendarme/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2631, in _call fetched = self._callable_fn(*array_vals) File "/home/gendarme/.virtualenvs/r-tensorflow/lib/python2.7/s

Basically it seems that my input (Y) is three times bigger than what is expected. Unfortunately I do not know how to fix the problem. I suspect that either my input shape of Y encoding the classes needs to be changed or my loss function is not appropriate for my input X and Y.

I also had a look at categorical cross entropy which ran but alleviated very bad results: model <- model %>% compile(loss = k_categorical_crossentropy, optimizer = 'adam', metrics = 'accuracy')

May be there is a problem with custom loss function?

Does anyone see where the problem is?

Thanks a lot in advance for your help.

Mathieu

skeydan commented 6 years ago

Hi,

from looking at your model definition, your predicted values will be of shape (?, 512, 512, 1). Your ground truth has shape (?, 512, 512, 3) (because you are using 3 classes). The shapes need to match in order for the metric to work.

mgendarme commented 6 years ago

Thanks skeydan, that's already helping a lot. I used as a basis the script from a kaggle challenge participant: https://www.kaggle.com/mviterson/segmentation-using-r-keras-u-net-lb-0-294

Being new to computer vision I still have a hard time to get how to specify the presence of more than one class in the U-Net. I post here the code to build the U-Net:

 `unet_layer <- function(object,
                       filters,
                       kernel_size = c(3, 3),
                       padding = "same",
                       kernel_initializer = "he_normal",
                       dropout = 0.1, activation="relu"){

object %>%
    layer_conv_2d(filters = filters, kernel_size = kernel_size, padding = padding) %>%
    ##layer_batch_normalization() %>%
    layer_activation(activation) %>%
    layer_spatial_dropout_2d(rate = dropout) %>%
    layer_conv_2d(filters = filters, kernel_size = kernel_size, padding = padding) %>%
    ##layer_batch_normalization() %>%
    layer_activation(activation)
 }
 unet <- function(shape, nlevels = 4, nfilters = 16, dropouts = c(0.1, 0.1, 0.2, 0.2, 0.3)){

message("Constructing U-Net with ", nlevels, " levels initial number of filters is: ", nfilters)

filter_sizes <- nfilters*2^seq.int(0, nlevels)

## Loop over contracting layers
clayers <- clayers_pooled <- list()

## inputs
clayers_pooled[[1]] <- layer_input(shape = shape)

  for(i in 2:(nlevels+1)) {
      clayers[[i]] <- unet_layer(clayers_pooled[[i - 1]],
                                 filters = filter_sizes[i - 1],
                                 dropout = dropouts[i-1])

      clayers_pooled[[i]] <- layer_max_pooling_2d(clayers[[i]],
                                                  pool_size = c(2, 2),
                                                  strides = c(2, 2))
  }

## Loop over expanding layers
elayers <- list()

## center
elayers[[nlevels + 1]] <- unet_layer(clayers_pooled[[nlevels + 1]],
                                     filters = filter_sizes[nlevels + 1],
                                     dropout = dropouts[nlevels + 1])

  for(i in nlevels:1) {
      elayers[[i]] <- layer_conv_2d_transpose(elayers[[i+1]],
                                              filters = filter_sizes[i],
                                              kernel_size = c(2, 2),
                                              strides = c(2, 2),
                                              padding = "same")

      elayers[[i]] <- layer_concatenate(list(elayers[[i]], clayers[[i + 1]]), axis = 3)
      elayers[[i]] <- unet_layer(elayers[[i]], filters = filter_sizes[i], dropout = dropouts[i])

  }

## Output layer
outputs <- layer_conv_2d(elayers[[1]], filters = 1, kernel_size = c(1, 1), activation = "sigmoid")

return(keras_model(inputs = clayers_pooled[[1]], outputs = outputs))
}

model <- unet(shape = SHAPE, nlevels = 3, nfilters = 16, dropouts = c(0.1, 0.1, 0.2, 0.3))   
summary(model)`

Actually changing the amount of filters to get out of the deconvolution was the solution. I leave the post open in case they would be other ideas.

skeydan commented 6 years ago

Not sure exactly what you did, but you could try something like this I think

    ## Output layer
    outputs <- list(
      layer_conv_2d(
        elayers[[1]],
        filters = 1,
        kernel_size = c(1, 1),
        activation = "sigmoid",
        name = "output1"
      ),
      layer_conv_2d(
        elayers[[1]],
        filters = 1,
        kernel_size = c(1, 1),
        activation = "sigmoid",
        name = "output2"
      ),
      layer_conv_2d(
        elayers[[1]],
        filters = 1,
        kernel_size = c(1, 1),
        activation = "sigmoid",
        name = "output3"
      )
    )

    return(keras_model(inputs = clayers_pooled[[1]], outputs = outputs))
  }

This should give you a 1d output between 0 and 1 for all 3 classes.

mgendarme commented 6 years ago

Thanks skeydan for the idea! I tried and unfortunaltely it did not work, the output should be a type (?, 512, 512, 3). Your proposition return a list of 3 arrays of shape (?, 512, 512, 1). In the original code I have posted last time changing the output line to the code below solved the problem: ## Output layer outputs <- layer_conv_2d(elayers[[1]], filters = 3, kernel_size = c(1, 1), activation = "sigmoid")

I imagine that with your solution replacing the list by an array of size (?, 512, 512, 3) where each channel is "output1", "output2" or "output3" should solve the problem.

Many thanks again for your help

skeydan commented 6 years ago

Hi, glad you got it to work! I think both ways should have very similar effects (for the way I suggested above, the target arrays would also have been to be provided as a list of 3 arrays, but then it should have worked).

mgendarme commented 6 years ago

Hi,

the target arrays would also have been to be provided as a list of 3 arrays

After trying it indeed this solution works as well! Thanks for the idea.

rstudio / keras3

Two classes semantic segmentation with U-Net #511

Layer (type) Output Shape Param # Connected to

conv2d_30 (Conv2D) (None, 512, 512, 1) 17 activation_28[0][0]