Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Hi. I am facing issues with grad-cam with my custom VitModel. I followed the tutorial for Vision Transformer here and tried to adapt to my model. I managed to get dff working but grad-cam throws an error. The structure in the tutorial is different so I am unsure if I am choosing the wrong layer or there is something wrong with the input tensor. I also tried solution here but it doesn't work. I am using google/vit-base-patch16-224-in21k.
Thank you.
This is the error:
An exception occurred in CAM with block: <class 'numpy.exceptions.AxisError'>. Message: axis 2 is out of bounds for array of dimension 0
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[149], line 27
17 tensor_resized1 = tensor_resized1
19 display(Image.fromarray(run_dff_on_image(model=vit_model.model,
20 target_layer=target_layer_dff,
21 classifier=vit_model.classifier,
(...)
25 n_components=3,
26 top_k=3)))
---> 27 display(Image.fromarray(run_grad_cam_on_image(model=vit_model.model,
28 target_layer=target_layer_gradcam,
29 targets_for_gradcam=targets_for_gradcam,
30 input_tensor=tensor_resized1,
31 input_image=image_resized1,
32 reshape_transform=reshape_transform_vit_huggingface)))
33 print_top_categories(model, tensor_resized1)
File ~/anaconda3/envs/env-pytorch/lib/python3.10/site-packages/PIL/Image.py:3119, in fromarray(obj, mode)
3072 def fromarray(obj, mode=None):
3073 """
3074 Creates an image memory from an object exporting the array interface
3075 (using the buffer protocol)::
(...)
3117 .. versionadded:: 1.1.6
3118 """
-> 3119 arr = obj.__array_interface__
3120 shape = arr["shape"]
3121 ndim = len(shape)
AttributeError: 'NoneType' object has no attribute '__array_interface__'
Code for grad-cam:
def reshape_transform(tensor, height=14, width=14):
result = tensor[:, 1:, :].reshape(tensor.size(0),
height, width, tensor.size(2))
# Bring the channels to the first dimension,
# like in CNNs.
result = result.transpose(2, 3).transpose(1, 2)
return result
target_layer_dff = vit_model.model.layernorm
target_layer_gradcam = vit_model.model.encoder.layer[-1].layernorm_before
image_resized1 = pil_img.resize((224, 224))
tensor_resized1 = transforms.ToTensor()(image_resized1)
tensor_resized1 = tensor_resized1
display(Image.fromarray(run_dff_on_image(model=vit_model.model,
target_layer=target_layer_dff,
classifier=vit_model.classifier,
img_pil=image_resized1,
img_tensor=tensor_resized1,
reshape_transform=reshape_transform,
n_components=3,
top_k=3)))
display(Image.fromarray(run_grad_cam_on_image(model=vit_model.model,
target_layer=target_layer_gradcam,
targets_for_gradcam=targets_for_gradcam,
input_tensor=tensor_resized1,
input_image=image_resized1,
reshape_transform=reshape_transform)))
print_top_categories(model, tensor_resized1)
Hi. I am facing issues with grad-cam with my custom VitModel. I followed the tutorial for Vision Transformer here and tried to adapt to my model. I managed to get dff working but grad-cam throws an error. The structure in the tutorial is different so I am unsure if I am choosing the wrong layer or there is something wrong with the input tensor. I also tried solution here but it doesn't work. I am using google/vit-base-patch16-224-in21k.
Thank you.
This is the error:
Code for grad-cam:
torch.nn.Module
structure