jacobgil / pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
MIT License
10.45k stars 1.55k forks source link

RunTimeError using custom architecture #323

Open Trotts opened 2 years ago

Trotts commented 2 years ago


I am trying to run GradCam over a custom architecture I have created. The architecture is as follows:

  (convnet): Sequential(
    (0): Conv2d(3, 59, kernel_size=(6, 6), stride=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): ReLU()
    (3): Dropout(p=0.1694977121723289, inplace=False)
    (4): Conv2d(59, 59, kernel_size=(5, 5), stride=(1, 1))
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc): Sequential(
    (0): Linear(in_features=297419, out_features=106, bias=True)
    (1): ReLU()

This architecture is an embedding network, and so I am using the [https://github.com/jacobgil/pytorch-grad-cam/blob/master/tutorials/Pixel%20Attribution%20for%20embeddings.ipynb](Pixel Attribution for Embeddings Notebook) to try and generate a heatmap. Currently, I have it set to just run on the default images for the moment.

When running the code for "Where is the car in the image", I am running into the following error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-9-9e1c4f908026> in <module>
     19              use_cuda=False) as cam:
     20     car_grayscale_cam = cam(input_tensor=input_tensor,
---> 21                         targets=car_targets)[0, :]

~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/pytorch_grad_cam/base_cam.py in __call__(self, input_tensor, targets, aug_smooth, eigen_smooth)
    188         return self.forward(input_tensor,
--> 189                             targets, eigen_smooth)
    191     def __del__(self):

~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/pytorch_grad_cam/base_cam.py in forward(self, input_tensor, targets, eigen_smooth)
     82             loss = sum([target(output)
     83                        for target, output in zip(targets, outputs)])
---> 84             loss.backward(retain_graph=True)
     86         # In most of the saliency attribution papers, the saliency is

~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    305                 create_graph=create_graph,
    306                 inputs=inputs)
--> 307         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    309     def register_hook(self, hook):

~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    154     Variable._execution_engine.run_backward(
    155         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 156         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

From a few other closed threads on this issue, it seems there is something I need to do with:

with torch.no_grad()

However I am at a complete loss as to where this needs to be, or if it is a deeper problem with my custom embedding network. Any help would be greatly appreciated. I am running python 3.6.9 and grad-cam version 1.4.5.

Code (dirs edited out):

import pickle

# A model wrapper that gets a model and returns the features before the fully connected layer.
class FeatureExtractor(torch.nn.Module):
    def __init__(self, model):
        super(FeatureExtractor, self).__init__()
        self.model = model
        self.feature_extractor = torch.nn.Sequential(*list(self.model.children())[:-1])

    def __call__(self, x):
        return self.feature_extractor(x)[:, :, 0, 0]

### Load the model with the optimal hyperparams previously located, based on best model checkpoint
best_params_src = $SRC

file = open(best_params_src ,'rb')
best_params = pickle.load(file)


resize_to = (300,300)
model_input_shape = [1, 3, resize_to[0], resize_to[1]] 

import torch.nn as nn

### Model
log_dir = $SRC

loader = torch.load(find_best_checkpoint(log_dir, n_way = True))
model_loader = loader['Model']

# Create a blank embedding model to build upon when loading in
loaded_model = Network(nlayers = best_params['nlayers'], 
                                      hidden_size = best_params['hidden_size'], 
                                      kernel_size = best_params['kernel_size'], 
                                      dropout = best_params['dropout'], 
                                      expected_img_shape = model_input_shape, 
                                      emb_size = best_params['emb_size'])

# If multiple gpus then parallelise the model
if cuda and num_gpus > 1:
    loaded_model = nn.DataParallel(loaded_model)

# Load, set to eval, create FeatureExtractor
model = FeatureExtractor(loaded_model)

car_img, car_img_float, car_tensor = get_image_from_url("https://www.wallpapersin4k.org/wp-content/uploads/2017/04/Foreign-Cars-Wallpapers-4.jpg")
cloud_img, cloud_img_float, cloud_tensor = get_image_from_url("https://th.bing.com/th/id/OIP.CmONj_pGCXg9Hq9-OxTD9gHaEo?pid=ImgDet&rs=1")
car_concept_features = model(car_tensor)[0, :]
cloud_concept_features = model(cloud_tensor)[0, :]

Image.fromarray(np.hstack((cloud_img, car_img)))

class SimilarityToConceptTarget:
    def __init__(self, features):
        self.features = features

    def __call__(self, model_output):
        cos = torch.nn.CosineSimilarity(dim=0)
        return cos(model_output, self.features)

target_layers = [loaded_model.module.convnet[-1]]
car_targets = [SimilarityToConceptTarget(car_concept_features)]
cloud_targets = [SimilarityToConceptTarget(cloud_concept_features)]

# Where is the car in the image
with GradCAM(model=model,
             use_cuda=False) as cam:
    car_grayscale_cam = cam(input_tensor=input_tensor,
                        targets=car_targets)[0, :] <---- ERROR HERE

car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)
Trotts commented 2 years ago


I fixed the above error by calling



# Where is the car in the image
with GradCAM(model=model,
             use_cuda=False) as cam:
    car_grayscale_cam = cam(input_tensor=input_tensor,
                        targets=car_targets)[0, :]

However I now seem to be running into the following:

An exception occurred in CAM with block: <class 'numpy.AxisError'>. Message: axis 2 is out of bounds for array of dimension 0

NameError                                 Traceback (most recent call last)
<ipython-input-9-5cdc1cb41639> in <module>
---> 23 car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)
     24 Image.fromarray(car_cam_image)

NameError: name 'car_grayscale_cam' is not defined

Looking back through the closed issues on this topic, it seems to be a problem with the layers I specify for target_layers - am I correct in thinking this?

If so, I currently select loaded_model.module.convnet[-1] as the target, which corresponds to:

MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

Any help with the above is greatly appreciated :)

jacobgil commented 2 years ago

Hi, Can you please try loaded_model.module.convnet[-2] and tell if it worked?

We need the 2D CNN activations before the pooling.

Trotts commented 2 years ago

Hi @jacobgil, I tried the suggestion but the same error occurs:

target_layers = [loaded_model.module.convnet[-2]]
car_targets = [SimilarityToConceptTarget(car_concept_features)]
cloud_targets = [SimilarityToConceptTarget(cloud_concept_features)]

# Where is the car in the image
with GradCAM(model=model,
             use_cuda=False) as cam:
    car_grayscale_cam = cam(input_tensor=input_tensor,
                        targets=car_targets)[0, :]

car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)

Results in:

An exception occurred in CAM with block: <class 'numpy.AxisError'>. Message: axis 2 is out of bounds for array of dimension 0

NameError                                 Traceback (most recent call last)
<ipython-input-13-784011a42087> in <module>
     20                         targets=car_targets)[0, :]
---> 22 car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)
     23 Image.fromarray(car_cam_image)

NameError: name 'car_grayscale_cam' is not defined

Am I correct in thinking the target_layers = [loaded_model.module.convnet[-2]] is where you wanted the change to be made?

jacobgil commented 2 years ago

Hi, sorry for the late response, I was traveling..

What does target_layers look like now? What is the output shape you expect from that layer ? The CAM algorithms expect it to have the shape batch x channels x height x width. Is that what we have there ?

If the dimension is different, we will need to write a reshape_transform.

Also, what is the dimension of input_tensor ?

Trotts commented 2 years ago

Hi, also sorry for my late reply.

printing target_layers shows as: [Conv2d(59, 59, kernel_size=(5, 5), stride=(1, 1))]

Using torchsummary to get the expected output for a (3, 300, 300) image:

summary(loaded_model.module, (3, 300, 300))


        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 59, 295, 295]           6,431
         MaxPool2d-2         [-1, 59, 147, 147]               0
              ReLU-3         [-1, 59, 147, 147]               0
           Dropout-4         [-1, 59, 147, 147]               0
            Conv2d-5         [-1, 59, 143, 143]          87,084
         MaxPool2d-6           [-1, 59, 71, 71]               0
            Linear-7                  [-1, 106]      31,526,520
              ReLU-8                  [-1, 106]               0
Total params: 31,620,035
Trainable params: 31,620,035
Non-trainable params: 0
Input size (MB): 1.03
Forward/backward pass size (MB): 79.83
Params size (MB): 120.62
Estimated Total Size (MB): 201.48

The model expected input is [1, 3, 300, 300], with images reshaped to 300, 300 before input:

print(image.shape) gives (300, 300, 3).

input_tensor.shape gives torch.Size([1, 3, 300, 300])

So I believe everything is in the shape expected, at least before passing to the model/CAM, however I may have missed something?

Markson-Young commented 10 months ago

Hi, also sorry for my late reply.

printing target_layers shows as: [Conv2d(59, 59, kernel_size=(5, 5), stride=(1, 1))]

Using torchsummary to get the expected output for a (3, 300, 300) image:

summary(loaded_model.module, (3, 300, 300))


        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 59, 295, 295]           6,431
         MaxPool2d-2         [-1, 59, 147, 147]               0
              ReLU-3         [-1, 59, 147, 147]               0
           Dropout-4         [-1, 59, 147, 147]               0
            Conv2d-5         [-1, 59, 143, 143]          87,084
         MaxPool2d-6           [-1, 59, 71, 71]               0
            Linear-7                  [-1, 106]      31,526,520
              ReLU-8                  [-1, 106]               0
Total params: 31,620,035
Trainable params: 31,620,035
Non-trainable params: 0
Input size (MB): 1.03
Forward/backward pass size (MB): 79.83
Params size (MB): 120.62
Estimated Total Size (MB): 201.48

The model expected input is [1, 3, 300, 300], with images reshaped to 300, 300 before input:

print(image.shape) gives (300, 300, 3).

input_tensor.shape gives torch.Size([1, 3, 300, 300])

So I believe everything is in the shape expected, at least before passing to the model/CAM, however I may have missed something?

Hello! Have you sloved this problem now? I had the same problem.