utkuozbulak / pytorch-cnn-visualizations

Pytorch implementation of convolutional neural network visualization techniques
MIT License
7.86k stars 1.49k forks source link

Running Resnet 101 torchvision model fails for gradcam (activation maps) #3

Closed awadsb1 closed 6 years ago

awadsb1 commented 6 years ago

Alexnet works fine, but Resnet 101 fails.

I set pretrained_model = models.resnet101(pretrained=True) in get_params() in misc_funcions.py.
Then in forward_pass_on_convolutions() in gradcam, resnet models don't have model.features, so:

for module_pos, module in self.model.features._modules.items():

fails.

Any suggestion? self.model has _modules property, but that was not successful either, and doesn't seem to show all tensors/layers of Resnet 101

utkuozbulak commented 6 years ago

It should fail on ResNet. Most of the code in this repository assumes the model is separated into features (that contains convs) and classifier (that contains fully connected layer) (See the the General Info part in README).

If you want to use this code on models that does not make this distinction you will have to edit the parts where it calls .features or .classifier to target the specific layers/classes you want to target.

It is especially awkward for ResNet models because of residual blocks (everything is nested) but as long as you are able to iterate and target the layer you should be fine.

You should replace for module_pos, module in self.model.features._modules.items(): with pretrained_model._modules.items(): to properly iterate because ResNet has no .features. But now, you have to iterate over all blocks so be careful, everything is nested in Layers and then in bottlenecks.

This is the partial output of ResNet101._modules.items()

. . . ('layer1', Sequential ( (0): Bottleneck ( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True) (relu): ReLU (inplace) (downsample): Sequential ( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True) ) ) (1): Bottleneck ( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True) (relu): ReLU (inplace) ) (2): Bottleneck ( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True) (relu): ReLU (inplace) ) )), ('layer2', Sequential ( (0): Bottleneck ( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True) (relu): ReLU (inplace) (downsample): Sequential ( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True) ) ) . . .

I suggest understanding what the code does for basic models (e.g AlexNet) then trying to go for more complex models, but in the end the only thing that should change is how/where you target (layers).

awadsb1 commented 6 years ago

Thanks for the suggestions. I made many changes, the majority of which are below, to iterate on the nested resnet 101 layers:

SET target_layer = 'conv3' in gradcam.py

FUNCTION CHANGES:

def forward_pass_on_convolutions(self, x):
    """
        Does a forward pass on convolutions, hooks the function at given layer
    """
    conv_output = None
    if str(type(self.model)).__contains__('resnet'):
        for layer_name, module in self.model._modules.items():
            outer_layer_name = layer_name
            if len(module._modules) > 0: # Resnet "layer#" outerlayers have inner layers.  The "layer#" aren't operational
                for bottleneck_num, btl_module in module._modules.items(): # Immediate inner layer is "BottleNeck". Again, not operational.  Here, layer_name = 'layer1' module = Sequential
                    bottleneck_number=bottleneck_num # here bottleneck_num = '0' btl_module is BottleNeck
                    for inner_bottle_layer_name, inner_btl_module in btl_module._modules.items(): # These should be actual layers.  Here, inner_bottle_layer_name = 'conv1'. module=Conv2d
                        if inner_bottle_layer_name == 'downsample': # These aren't real layers
                            continue
                        x = inner_btl_module(x)  # Forward
                        # print('*******\n{0}\n*******'.format(x.data))
                        print('{0}:\t{1}:\t{2}'.format(outer_layer_name, bottleneck_number, inner_bottle_layer_name))
                        print(x.data.sum())
                        if inner_bottle_layer_name == self.target_layer and outer_layer_name=='layer4' and bottleneck_number=='2':
                            x.register_hook(self.save_gradient)
                            conv_output = x  # Save the convolution output on that layer
                            return conv_output, x # Returning here should skip avgPool and fc as we desire
            else: #No inner layers, outer layer is what we want
                # print('*******\n{0}\n*******'.format(x.data))
                x = module(x)  # Forward
                print('{0}:\t{1}:\t{2}'.format(outer_layer_name, "NO_BOTTLENECK", layer_name))
                print(x.data.sum())
                if layer_name == self.target_layer:
                    x.register_hook(self.save_gradient)
                    conv_output = x  # Save the convolution output on that layer\
                # if layer_name == 'avgpool': # Let's see if this makes a difference
                    return conv_output, x
        return conv_output, x

    else: # AlexNet/VGG models
        for module_pos, module in self.model.features._modules.items():
            x = module(x)  # Forward
            # if int(module_pos) == self.target_layer:
            if int(module_pos) == self.target_layer:
                x.register_hook(self.save_gradient)
                conv_output = x  # Save the convolution output on that layer
        return conv_output, x

def forward_pass(self, x):
    """
        Does a full forward pass on the model
    """
    # Forward pass on the convolutions
    conv_output, x = self.forward_pass_on_convolutions(x)

    # Forward pass on the classifier
    if str(type(self.model)).__contains__('resnet'):
        # We need to replace with the exact layers as we don't have the .classifier and .features
        # x = x.view(x.size(0), -1)  # Flatten
        x = self.model.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.model.fc(x)
        print('max_val: {0} index: {1}'.format(np.max(x.data.cpu().numpy()[0]), np.argmax(x.data.cpu().numpy()[0])))
    else:
        x = x.view(x.size(0), -1)  # Flatten
        x = self.model.classifier(x)
    return conv_output, x

def generate_cam(self, input_image, target_index=None):
    # Full forward pass
    # conv_output is the output of convolutions at specified layer
    # model_output is the final output of the model (1, 1000)
    conv_output, model_output = self.extractor.forward_pass(input_image)
    if target_index is None:
        target_index = np.argmax(model_output.data.numpy())
    # Target for backprop
    one_hot_output = torch.FloatTensor(1, model_output.size()[-1]).zero_()
    one_hot_output[0][target_index] = 1
    # Zero grads
    if str(type(self.model)).__contains__('resnet'): # Resnet
        for layer_name, module in self.model._modules.items(): # For the inner models
            outer_layer_name = layer_name
            if len(module._modules) > 0:  # Resnet "layer#" outerlayers have inner layers.  The "layer#" aren't operational
                for bottleneck_num, btl_module in module._modules.items():  # Immediate inner layer is "BottleNeck". Again, not operational.  Here, layer_name = 'layer1' module = Sequential
                    bottleneck_number = bottleneck_num  # here bottleneck_num = '0' btl_module is BottleNeck
                    for inner_bottle_layer_name, inner_btl_module in btl_module._modules.items():  # These should be actual layers.  Here, inner_bottle_layer_name = 'conv1'. module=Conv2d
                        if inner_bottle_layer_name == 'downsample':  # These aren't real layers
                            continue
                        inner_btl_module.zero_grad()
            else:
                module.zero_grad()  # for the outer modules
    else: # AlexNet/VGG
        self.model.features.zero_grad()
        self.model.classifier.zero_grad()

However, I get the attached images out, which are not convincing. I tested the layer operations, and they indeed alter the "x" variable (state vector that starts as input image).

image

image

image

Any suggestion?

awadsb1 commented 6 years ago

Ok, here is an implementation of class activation maps with resnet:

https://github.com/metalbubble/CAM

It gives convincing results:

image

utkuozbulak commented 6 years ago

Like I said, as long as you are able to target correct layers, it should be fine. Glad you were able to find a solution to your problem.