jacobgil / pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
https://jacobgil.github.io/pytorch-gradcam-book
MIT License
10.45k stars 1.55k forks source link

RunTimeError using custom architecture #323

Open Trotts opened 2 years ago

Trotts commented 2 years ago

Hi,

I am trying to run GradCam over a custom architecture I have created. The architecture is as follows:

  (convnet): Sequential(
    (0): Conv2d(3, 59, kernel_size=(6, 6), stride=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): ReLU()
    (3): Dropout(p=0.1694977121723289, inplace=False)
    (4): Conv2d(59, 59, kernel_size=(5, 5), stride=(1, 1))
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=297419, out_features=106, bias=True)
    (1): ReLU()
  )
)

This architecture is an embedding network, and so I am using the [https://github.com/jacobgil/pytorch-grad-cam/blob/master/tutorials/Pixel%20Attribution%20for%20embeddings.ipynb](Pixel Attribution for Embeddings Notebook) to try and generate a heatmap. Currently, I have it set to just run on the default images for the moment.

When running the code for "Where is the car in the image", I am running into the following error:


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-9-9e1c4f908026> in <module>
     19              use_cuda=False) as cam:
     20     car_grayscale_cam = cam(input_tensor=input_tensor,
---> 21                         targets=car_targets)[0, :]
     22 
     23 

~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/pytorch_grad_cam/base_cam.py in __call__(self, input_tensor, targets, aug_smooth, eigen_smooth)
    187 
    188         return self.forward(input_tensor,
--> 189                             targets, eigen_smooth)
    190 
    191     def __del__(self):

~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/pytorch_grad_cam/base_cam.py in forward(self, input_tensor, targets, eigen_smooth)
     82             loss = sum([target(output)
     83                        for target, output in zip(targets, outputs)])
---> 84             loss.backward(retain_graph=True)
     85 
     86         # In most of the saliency attribution papers, the saliency is

~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    305                 create_graph=create_graph,
    306                 inputs=inputs)
--> 307         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    308 
    309     def register_hook(self, hook):

~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    154     Variable._execution_engine.run_backward(
    155         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 156         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
    157 
    158 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

From a few other closed threads on this issue, it seems there is something I need to do with:

with torch.no_grad()

However I am at a complete loss as to where this needs to be, or if it is a deeper problem with my custom embedding network. Any help would be greatly appreciated. I am running python 3.6.9 and grad-cam version 1.4.5.

Code (dirs edited out):


import pickle

# A model wrapper that gets a model and returns the features before the fully connected layer.
class FeatureExtractor(torch.nn.Module):
    def __init__(self, model):
        super(FeatureExtractor, self).__init__()
        self.model = model
        self.feature_extractor = torch.nn.Sequential(*list(self.model.children())[:-1])

    def __call__(self, x):
        return self.feature_extractor(x)[:, :, 0, 0]

### Load the model with the optimal hyperparams previously located, based on best model checkpoint
best_params_src = $SRC

file = open(best_params_src ,'rb')
best_params = pickle.load(file)
file.close()

print(best_params)

resize_to = (300,300)
model_input_shape = [1, 3, resize_to[0], resize_to[1]] 

import torch.nn as nn

### Model
log_dir = $SRC

loader = torch.load(find_best_checkpoint(log_dir, n_way = True))
model_loader = loader['Model']

# Create a blank embedding model to build upon when loading in
loaded_model = Network(nlayers = best_params['nlayers'], 
                                      hidden_size = best_params['hidden_size'], 
                                      kernel_size = best_params['kernel_size'], 
                                      dropout = best_params['dropout'], 
                                      expected_img_shape = model_input_shape, 
                                      emb_size = best_params['emb_size'])

# If multiple gpus then parallelise the model
if cuda and num_gpus > 1:
    loaded_model = nn.DataParallel(loaded_model)
    loaded_model.cuda()

# Load, set to eval, create FeatureExtractor
loaded_model.load_state_dict(model_loader)
loaded_model.eval()
model = FeatureExtractor(loaded_model)

car_img, car_img_float, car_tensor = get_image_from_url("https://www.wallpapersin4k.org/wp-content/uploads/2017/04/Foreign-Cars-Wallpapers-4.jpg")
cloud_img, cloud_img_float, cloud_tensor = get_image_from_url("https://th.bing.com/th/id/OIP.CmONj_pGCXg9Hq9-OxTD9gHaEo?pid=ImgDet&rs=1")
car_concept_features = model(car_tensor)[0, :]
cloud_concept_features = model(cloud_tensor)[0, :]

Image.fromarray(np.hstack((cloud_img, car_img)))

class SimilarityToConceptTarget:
    def __init__(self, features):
        self.features = features

    def __call__(self, model_output):
        cos = torch.nn.CosineSimilarity(dim=0)
        return cos(model_output, self.features)

target_layers = [loaded_model.module.convnet[-1]]
car_targets = [SimilarityToConceptTarget(car_concept_features)]
cloud_targets = [SimilarityToConceptTarget(cloud_concept_features)]

# Where is the car in the image
with GradCAM(model=model,
             target_layers=target_layers,
             use_cuda=False) as cam:
    car_grayscale_cam = cam(input_tensor=input_tensor,
                        targets=car_targets)[0, :] <---- ERROR HERE

car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)
Image.fromarray(car_cam_image)
Trotts commented 2 years ago

Update:

I fixed the above error by calling

car_concept_features.requires_grad_()
cloud_concept_features.requires_grad_()

Before

# Where is the car in the image
with GradCAM(model=model,
             target_layers=target_layers,
             use_cuda=False) as cam:
    car_grayscale_cam = cam(input_tensor=input_tensor,
                        targets=car_targets)[0, :]

However I now seem to be running into the following:

An exception occurred in CAM with block: <class 'numpy.AxisError'>. Message: axis 2 is out of bounds for array of dimension 0

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-5cdc1cb41639> in <module>
     21 
     22 
---> 23 car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)
     24 Image.fromarray(car_cam_image)

NameError: name 'car_grayscale_cam' is not defined

Looking back through the closed issues on this topic, it seems to be a problem with the layers I specify for target_layers - am I correct in thinking this?

If so, I currently select loaded_model.module.convnet[-1] as the target, which corresponds to:

MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

Any help with the above is greatly appreciated :)

jacobgil commented 2 years ago

Hi, Can you please try loaded_model.module.convnet[-2] and tell if it worked?

We need the 2D CNN activations before the pooling.

Trotts commented 2 years ago

Hi @jacobgil, I tried the suggestion but the same error occurs:

target_layers = [loaded_model.module.convnet[-2]]
car_targets = [SimilarityToConceptTarget(car_concept_features)]
cloud_targets = [SimilarityToConceptTarget(cloud_concept_features)]

# Where is the car in the image
with GradCAM(model=model,
             target_layers=target_layers,
             use_cuda=False) as cam:
    car_grayscale_cam = cam(input_tensor=input_tensor,
                        targets=car_targets)[0, :]

car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)
Image.fromarray(car_cam_image)

Results in:

An exception occurred in CAM with block: <class 'numpy.AxisError'>. Message: axis 2 is out of bounds for array of dimension 0

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-13-784011a42087> in <module>
     20                         targets=car_targets)[0, :]
     21 
---> 22 car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)
     23 Image.fromarray(car_cam_image)

NameError: name 'car_grayscale_cam' is not defined

Am I correct in thinking the target_layers = [loaded_model.module.convnet[-2]] is where you wanted the change to be made?

jacobgil commented 2 years ago

Hi, sorry for the late response, I was traveling..

What does target_layers look like now? What is the output shape you expect from that layer ? The CAM algorithms expect it to have the shape batch x channels x height x width. Is that what we have there ?

If the dimension is different, we will need to write a reshape_transform.

Also, what is the dimension of input_tensor ?

Trotts commented 2 years ago

Hi, also sorry for my late reply.

printing target_layers shows as: [Conv2d(59, 59, kernel_size=(5, 5), stride=(1, 1))]

Using torchsummary to get the expected output for a (3, 300, 300) image:

summary(loaded_model.module, (3, 300, 300))

gives:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 59, 295, 295]           6,431
         MaxPool2d-2         [-1, 59, 147, 147]               0
              ReLU-3         [-1, 59, 147, 147]               0
           Dropout-4         [-1, 59, 147, 147]               0
            Conv2d-5         [-1, 59, 143, 143]          87,084
         MaxPool2d-6           [-1, 59, 71, 71]               0
            Linear-7                  [-1, 106]      31,526,520
              ReLU-8                  [-1, 106]               0
================================================================
Total params: 31,620,035
Trainable params: 31,620,035
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 1.03
Forward/backward pass size (MB): 79.83
Params size (MB): 120.62
Estimated Total Size (MB): 201.48
----------------------------------------------------------------

The model expected input is [1, 3, 300, 300], with images reshaped to 300, 300 before input:

print(image.shape) gives (300, 300, 3).

input_tensor.shape gives torch.Size([1, 3, 300, 300])

So I believe everything is in the shape expected, at least before passing to the model/CAM, however I may have missed something?

Markson-Young commented 10 months ago

Hi, also sorry for my late reply.

printing target_layers shows as: [Conv2d(59, 59, kernel_size=(5, 5), stride=(1, 1))]

Using torchsummary to get the expected output for a (3, 300, 300) image:

summary(loaded_model.module, (3, 300, 300))

gives:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 59, 295, 295]           6,431
         MaxPool2d-2         [-1, 59, 147, 147]               0
              ReLU-3         [-1, 59, 147, 147]               0
           Dropout-4         [-1, 59, 147, 147]               0
            Conv2d-5         [-1, 59, 143, 143]          87,084
         MaxPool2d-6           [-1, 59, 71, 71]               0
            Linear-7                  [-1, 106]      31,526,520
              ReLU-8                  [-1, 106]               0
================================================================
Total params: 31,620,035
Trainable params: 31,620,035
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 1.03
Forward/backward pass size (MB): 79.83
Params size (MB): 120.62
Estimated Total Size (MB): 201.48
----------------------------------------------------------------

The model expected input is [1, 3, 300, 300], with images reshaped to 300, 300 before input:

print(image.shape) gives (300, 300, 3).

input_tensor.shape gives torch.Size([1, 3, 300, 300])

So I believe everything is in the shape expected, at least before passing to the model/CAM, however I may have missed something?

Hello! Have you sloved this problem now? I had the same problem.