jacobgil / pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
https://jacobgil.github.io/pytorch-gradcam-book
MIT License
10.06k stars 1.52k forks source link

about cam for object detection #215

Closed JumeLin closed 2 years ago

JumeLin commented 2 years ago

File "H:/codefile/personal/yolov5/pytorch-grad-cam-master/my_cam.py", line 144, in reshape_transform=None) TypeError: EigenCAM() takes no arguments

I found that the 'fasterrcnn_reshape_transform' was just defined but didn't use it when I try to implement the faster-rcnn. Then the error came out.

jacobgil commented 2 years ago

Hi, Sorry for the late reply. Can you please share more details, maybe some code, Maybe share the code snippet where you construct the CAM object ?

JumeLin commented 2 years ago

Thank you very much. forget that question which is a silly question. I have the other question when I use the grad-cam to realize the heatmap of YOLOV5 for object detection, but something wrong as follows:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor 【1, 3, 20, 20, 12】], which is output 0 of SigmoidBackward, is at version 2; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I don't know how to solute that question. the function "get_loss" in my code was wrong.

class GradCAM:
    def __init__(self,
                 model,
                 target_layers,
                 reshape_transform=None,
                 use_cuda=False):
        self.model = model.eval()
        self.target_layers = target_layers
        self.reshape_transform = reshape_transform
        self.cuda = use_cuda
        if self.cuda:
            self.model = model.cuda()
        self.activations_and_grads = ActivationsAndGradients(self.model, target_layers, reshape_transform)

    """ Get a vector of weights for every channel in the target layer.
        Methods that return weights channels,
        will typically need to only implement this function. """

    @staticmethod
    def get_cam_weights(grads):
        return np.mean(grads, axis=(2, 3), keepdims=True)

    @staticmethod
    def get_loss(output, target_category):
        loss = 0
        # print(target_category)
        for i in range(len(target_category)):
            output = output[i]
            # print(output.size())
            loss = loss + output(:, 5+target_category[i])
        return loss

    def get_cam_image(self, activations, grads):
        weights = self.get_cam_weights(grads)
        weighted_activations = weights * activations
        cam = weighted_activations.sum(axis=1)

        return cam

    @staticmethod
    def get_target_width_height(input_tensor):
        width, height = input_tensor.size(-1), input_tensor.size(-2)
        return width, height

    def compute_cam_per_layer(self, input_tensor):
        activations_list = [a.cpu().data.numpy()
                            for a in self.activations_and_grads.activations]
        grads_list = [g.cpu().data.numpy()
                      for g in self.activations_and_grads.gradients]
        target_size = self.get_target_width_height(input_tensor)

        cam_per_target_layer = []
        # Loop over the saliency image from every layer

        for layer_activations, layer_grads in zip(activations_list, grads_list):
            cam = self.get_cam_image(layer_activations, layer_grads)
            cam[cam < 0] = 0  # works like mute the min-max scale in the function of scale_cam_image
            scaled = self.scale_cam_image(cam, target_size)
            cam_per_target_layer.append(scaled[:, None, :])

        return cam_per_target_layer

    def aggregate_multi_layers(self, cam_per_target_layer):
        cam_per_target_layer = np.concatenate(cam_per_target_layer, axis=1)
        cam_per_target_layer = np.maximum(cam_per_target_layer, 0)
        result = np.mean(cam_per_target_layer, axis=1)
        return self.scale_cam_image(result)

    @staticmethod
    def scale_cam_image(cam, target_size=None):
        result = []
        for img in cam:
            img = img - np.min(img)
            img = img / (1e-7 + np.max(img))
            if target_size is not None:
                img = cv2.resize(img, target_size)
            result.append(img)
        result = np.float32(result)

        return result

    def __call__(self, input_tensor, target_category=None):

        if self.cuda:
            input_tensor = input_tensor.cuda()

        # 正向传播得到网络输出logits(未经过softmax)
        # 这里output = self.activation_and_grads(input_tensor)是将输入进行正向传播并注册正向传播hook函数和方向hook函数, 
        # 得到activation和反向的grad
        output = self.activations_and_grads(input_tensor)[0]

        # debug
        # 由Yolov5的predict函数可知预测输出为 pred=model(img, augment=augment, visualize=visualize)[0],and the size of pred 
        # is (bs, num_box, xywh+obj_conf+obj_classes)        

        # 判断target_category是否为int类型
        if isinstance(target_category, int):
            # 如果是,再将target_category乘上一个batch维度的信息
            target_category = [target_category] * input_tensor.size(0)

        if target_category is None:
            # 如果指定的target_category为None,则预测得分最高的类
            target_category = np.argmax(output.cpu().data.numpy(), axis=-1)
            print(f"category id: {target_category}")
        else:
            assert (len(target_category) == input_tensor.size(0))

        self.model.zero_grad()
        loss = self.get_loss(output, target_category)
        loss.backward(retain_graph=True)
        cam_per_layer = self.compute_cam_per_layer(input_tensor)
        return self.aggregate_multi_layers(cam_per_layer)

    def __del__(self):
        self.activations_and_grads.release()

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, exc_tb):
        self.activations_and_grads.release()
        if isinstance(exc_value, IndexError):
            # Handle IndexError here...
            print(
                f"An exception occurred in CAM with block: {exc_type}. Message: {exc_value}")
            return True

Hi, Sorry for the late reply. Can you please share more details, maybe some code, Maybe share the code snippet where you construct the CAM object ?

jacobgil commented 2 years ago

The get_loss function seems strange. output = output[i] What is the intention in this line?

jacobgil commented 2 years ago

Closing this issue, please re-open if relevant.