nyukat / BIRADS_classifier

High-resolution breast cancer screening with multi-view deep convolutional neural networks
https://arxiv.org/abs/1703.07047
BSD 2-Clause "Simplified" License
148 stars 58 forks source link

PyTorch model has two-views instead of four? #12

Closed utkuozbulak closed 5 years ago

utkuozbulak commented 5 years ago

Hello there,

In the paper you say that the model has four views and that the pipelines for four views are separate until global average pool and then concatenated. Something like following ASCII art:

R-MLO  |-a1-|-a2-|-a3-|--|--|--|--|--|--| \
L-MLO  |-b1-|-b2-|-b3-|--|--|--|--|--|--|\ \
                                           > >--|--|--|--|
R-CC   |-c1-|-c2-|-c3-|--|--|--|--|--|--|/ /
L-CC   |-d1-|-d2-|-d3-|--|--|--|--|--|--| /

But when I look at the layers for the python model, for MLO and CC, only one layer is defined for each one, instead of two. In the forward phase both MLO and CC images are put forward through same layer. Something like:

L-CC  \           \           \
       \|-a1-/|-a1-\|-a2-/|-a2-\|-a3-/|-a3-|... 
R-CC        /           /           /

Which creates a bizarre architecture like:

R-MLO  |-a1-|-a2-|-a3-|--|--|--|--|--|--| \
L-MLO  |-a1-|-a2-|-a3-|--|--|--|--|--|--|\ \
                                           > >--|--|--|--|
R-CC   |-b1-|-b2-|-b3-|--|--|--|--|--|--|/ /
L-CC   |-b1-|-b2-|-b3-|--|--|--|--|--|--| /

Here is the code for conv. layers taken from layers_torch.py


    def __init__(self, in_channels, number_of_filters=32, filter_size=(3, 3), stride=(1, 1)):
        super(AllViewsConvLayer, self).__init__()
        self.cc = nn.Conv2d(
            in_channels=in_channels,
            out_channels=number_of_filters,
            kernel_size=filter_size,
            stride=stride,
        )
        self.mlo = nn.Conv2d(
            in_channels=in_channels,
            out_channels=number_of_filters,
            kernel_size=filter_size,
            stride=stride,
        )

    def forward(self, x):
        return {
            "L-CC": F.relu(self.cc(x["L-CC"])),    # [Addition] (1)
            "L-MLO": F.relu(self.mlo(x["L-MLO"])),
            "R-CC": F.relu(self.cc(x["R-CC"])),    # [Addition] (2)
            "R-MLO": F.relu(self.mlo(x["R-MLO"])),
        }

Notice that in lines (1) and (2) L-CC and R-CC is forwarded from the same layer. Also the same issue for R-MLO and L-MLO.

Here is what I get when I try to reach for the first layer in the model, there is only two conv layers instead of four. Therefore, the model effectively has two views, not four.

model._conv_layer_ls[0]
Out[20]: 
AllViewsConvLayer(
  (cc): Conv2d(1, 32, kernel_size=(3, 3), stride=(2, 2))
  (mlo): Conv2d(1, 32, kernel_size=(3, 3), stride=(2, 2))
)

For TF however, the layers seems to be in line with the paper:

def all_views_conv_layer(input_layer, layer_name, number_of_filters=32, filter_size=(3, 3), stride=(1, 1),
                         padding='VALID', biases_initializer=tf.zeros_initializer()):
    """Convolutional layers across all 4 views"""

    input_l_cc, input_r_cc, input_l_mlo, input_r_mlo = input_layer

    with tf.variable_scope(layer_name + "_CC") as cc_cope:
        h_l_cc = tf.contrib.layers.convolution2d(inputs=input_l_cc, num_outputs=number_of_filters,
                                                 kernel_size=filter_size, stride=stride, padding=padding,
                                                 scope=cc_cope, biases_initializer=biases_initializer)
        h_r_cc = tf.contrib.layers.convolution2d(inputs=input_r_cc, num_outputs=number_of_filters,
                                                 kernel_size=filter_size, stride=stride, padding=padding, reuse=True,
                                                 scope=cc_cope, biases_initializer=biases_initializer)

    with tf.variable_scope(layer_name + "_MLO") as mlo_cope:
        h_l_mlo = tf.contrib.layers.convolution2d(inputs=input_l_mlo, num_outputs=number_of_filters,
                                                  kernel_size=filter_size, stride=stride, padding=padding,
                                                  scope=mlo_cope, biases_initializer=biases_initializer)
        h_r_mlo = tf.contrib.layers.convolution2d(inputs=input_r_mlo, num_outputs=number_of_filters,
                                                  kernel_size=filter_size, stride=stride, padding=padding, reuse=True,
                                                  scope=mlo_cope, biases_initializer=biases_initializer)

    h = (h_l_cc, h_r_cc, h_l_mlo, h_r_mlo)

    return h

Is this perhaps an oversight or am I missing something? Does this pytorch model achieve similar accuracy in the dataset?

Thanks.

zphang commented 5 years ago

Hi Utku,

You are right that there are 4 separate processing columns (for the four views), but only 2 CNN "modules" (CC/MLO). This is explained in the paper:

First, we tied the weights in the corresponding columns, i.e., the parameters of the columns processing L-CC and R-CC views were shared as were those of the columns processing L-MLO and R-MLO views.

Reusing the (e.g.) CC module for both L-CC and R-CC inputs is equivalent to the desired weight-sharing. The implementation for the TF version similarly shares weights, but TF has different semantics so it's explicitly expressed as two separate computational nodes.

Incidentally, I have been following your work on implementing visualization/interpretability methods, and I'm wondering if you were trying to apply https://github.com/utkuozbulak/pytorch-cnn-visualizations to this. I also tried that! But if I'm guessing right, you also encountered issues with not being able to separate the gradients for the two forward passes on the same module.

Le me know if this answers your question.

utkuozbulak commented 5 years ago

Hello Jason,

Oh, so I was indeed missing something. I see your point! I think I was kind of deceived by Fig.3 where the figure is more likely to be interpreted as views having separate pipelines. Sorry for that.

No, I wasn't planning to use the vis. techniques there. I have a few weird ideas that I want to test on multi-view models but like you said, I think separating the gradients will be a problem. But... Thinking about it again, after the training phase is done we can duplicate two pipes (CC and MLO) which will allow us to keep track of gradients for each image. It will be similar to the architecture below but a and b, and, c and d will have the same weights. This way we can track the gradients without breaking the overall behavior of the model. Although the size of the model will be almost double what it is now.

R-MLO  |-a1-|-a2-|-a3-|--|--|--|--|--|--| \
L-MLO  |-b1-|-b2-|-b3-|--|--|--|--|--|--|\ \
                                           > >--|--|--|--|
R-CC   |-c1-|-c2-|-c3-|--|--|--|--|--|--|/ /
L-CC   |-d1-|-d2-|-d3-|--|--|--|--|--|--| /

This definitely answered my question, thank you for your time!