jacobgil / pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
https://jacobgil.github.io/pytorch-gradcam-book
MIT License
10.13k stars 1.53k forks source link

CCT-MODEL: axis 3 is out of bounds for array of dimension 3 #344

Open rzamarefat opened 1 year ago

rzamarefat commented 1 year ago

Hi, thank you for this awesome repo. I am trying to get the GradCAM for the CCT (Compact Convolutional Transformer) taken from here. I give the model a tensor of [1, 3, 224, 224] and the following error comes up: File "/home/marefat/projects/NSFW/venv/lib/python3.8/site-packages/numpy/core/_methods.py", line 78, in _count_reduce_items items *= arr.shape[mu.normalize_axis_index(ax, arr.ndim)] numpy.AxisError: axis 3 is out of bounds for array of dimension 3 I am not sure about the following target_layers but I have tried many different layers and I still get the error.

target_layers = [model.cct_model.classifier.blocks[-1].norm1]

Any help would be appreciated.

rzamarefat commented 1 year ago

I have included a reshape_transform in GradCAM and it solved. But how can I set the width and height argument in reshape_transform function correctly?

jacobgil commented 1 year ago

Hi,

Can you please clarify the question:)

You can define your own reshape_transform and pass it. Do you know the width/height in advance ?

rzamarefat commented 1 year ago

The main problem is that I have provided GradCAM instance a reshape_transform function but I don't know the suitable width and height. imagine that I have set the width and height to 10. The following error happens:

RuntimeError: shape '[1, 10, 10, 384]' is invalid for input of size 74880

if I set them to, for instance, width=20 and height=30 I got the following error:

RuntimeError: shape '[1, 30, 20, 384]' is invalid for input of size 74880

So my solution to this is that I divide 74880 by 384 and get 195. Now the number "195" can be expressed as 13*15 and I set width and height arguments of reshape_transform funstion to 13 and 15 respectively and it works. BUT AS YOU CAN SEE THIS DOES NOT HAVE ANY LOGIC and I think the resulted grad cam that I get using this approach is completely misleading and incorrect.

Please note that the input of my model is (batch, 3, 224, 224)

rzamarefat commented 1 year ago

any ideas for solving the issue would be appreciated.

Rainydu184 commented 1 year ago

Have you save this problem?