Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Hi, I'm working on the attention mechanism for face recognition models, I'm using the ir model as a backbone, but I don't know much about the details of the implementation of grad-cam, what exactly should I do, and do none of the targets defined in pytorch_grad_cam.utils.model_targets apply to face recognition and verification tasks? How do I rationalize the generation of attention maps? Is it possible to customize targets like cosine_similarity?
grayscale_cams = cam(input_tensor=input_tensor, targets=targets)
... ...
“torch/autograd/__init__.py", line 50, in _make_grads
RuntimeError: grad can be implicitly created only for scalar outputs
Here's the model:
class Backbone(Module):
def __init__(self, input_size, num_layers, mode='ir'):
super(Backbone, self).__init__()
assert input_size[0] in [112, 224], "input_size should be [112, 112] or [224, 224]"
assert num_layers in [50, 100, 152], "num_layers should be 50, 100 or 152"
assert mode in ['ir', 'ir_se'], "mode should be ir or ir_se"
blocks = get_blocks(num_layers)
if mode == 'ir':
unit_module = bottleneck_IR
elif mode == 'ir_se':
unit_module = bottleneck_IR_SE
self.input_layer = Sequential(Conv2d(3, 64, (3, 3), 1, 1, bias=False),
BatchNorm2d(64),
PReLU(64))
if input_size[0] == 112:
self.output_layer = Sequential(BatchNorm2d(512),
Dropout(0.4),
Flatten(),
Linear(512 * 7 * 7, 512),
# BatchNorm1d(512, affine=False))
BatchNorm1d(512))
else:
self.output_layer = Sequential(BatchNorm2d(512),
Dropout(0.4),
Flatten(),
Linear(512 * 14 * 14, 512),
# BatchNorm1d(512, affine=False))
BatchNorm1d(512))
modules = [unit_module(bottleneck.in_channel, bottleneck.depth, bottleneck.stride)
for block in blocks for bottleneck in block]
self.body = Sequential(*modules)
self._initialize_weights()
def forward(self, x):
x = self.input_layer(x)
x = self.body(x)
conv_out = x.view(x.shape[0], -1)
x = self.output_layer(x)
# norm = torch.norm(x, p=2, dim=1)
# x = torch.div(x, norm)
# return x, conv_out
return x
def IR_152(input_size):
model = Backbone(input_size, 152, 'ir')
return model
Hi, I'm working on the attention mechanism for face recognition models, I'm using the ir model as a backbone, but I don't know much about the details of the implementation of grad-cam, what exactly should I do, and do none of the targets defined in pytorch_grad_cam.utils.model_targets apply to face recognition and verification tasks? How do I rationalize the generation of attention maps? Is it possible to customize targets like cosine_similarity?
Here's how I realized it:
I get the following error:
Here's the model: