open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.03k stars 1.2k forks source link

I got error when using grad cam for timeSformer #1058

Open YNawal opened 2 years ago

YNawal commented 2 years ago

Hello I'm trying to visualize the grad cam of my timeSformer model I tried differents target layer name but without success The error is either : AttributeError: 'NoneType' object has no attribute 'size' or , c, tg, , _ = gradients.size() ValueError: not enough values to unpack (expected 5, got 3)

I'm using rawframes datset. Thanks for you help

YNawal commented 2 years ago

@kennymckormick Could you please see my problem

kennymckormick commented 2 years ago

GradCAM hasn't supported the transformer-based models yet. @irvingzhang0512 do you have some time to look at this problem?

YNawal commented 2 years ago

I don't know if reshaping gradient could make sense.

irvingzhang0512 commented 2 years ago

GradCAM hasn't supported the transformer-based models yet. @irvingzhang0512 do you have some time to look at this problem?

I'll take a look in August

Tortoise17 commented 2 years ago

I am also facing same issue with kinetics dataset for training

yehuixie commented 12 months ago

Does GradCAM in mmaction2 support transformer-based models?

ZechengLi19 commented 10 months ago

I edit gradcam_utils.py file (Line119) by this way to support the transformer-based models.

    gradients = self.target_gradients
    activations = self.target_activations
    if self.is_recognizer2d:
        # [B*Tg, C', H', W']
        b_tg, c, _, _ = gradients.size()
        tg = b_tg // b
    else:
        grad = gradients.size()
        # implement for transformer
        if len(grad) == 3:
            _, tg, c = grad

            ########### You can edit feature_h and feature_w to support your transformer
            feature_h = int(14)
            feature_w = int(14)
            ###########

            tg /= (feature_h*feature_w)
            tg = int(tg)
            gradients = gradients.reshape(-1,tg,feature_h,feature_w,c)
            gradients = gradients.permute(0,1,4,2,3)
            activations = activations.reshape(-1,tg,feature_h,feature_w,c)
            activations = activations.permute(0,1,4,2,3)
        elif len(grad) == 5:
            _, c, tg, _, _ = grad
            # target shape: [B, Tg, C', H', W']
            gradients = gradients.permute(0, 2, 1, 3, 4)
            activations = activations.permute(0, 2, 1, 3, 4)
        else:
            raise NotImplementedError("Please check grad shape")
EaaloZ commented 4 months ago

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!

            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)
ZechengLi19 commented 4 months ago

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!

            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)

Maybe you can share your gradcam_utils.py file to me, that I can help you find the bug.

yehuixie commented 4 months ago

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!

            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)

I have modified line 126 of the "gradcam_utils.py" file and solved the problem by adding two lines of code. Maybe you can try this method. By the way, I found this method in this github repository https://github.com/MartinXM/TPS.

if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            # source shape: [B, C', Tg, H', W']
            gradients = gradients.permute(0, 4, 1, 2, 3)
            activations = activations.permute(0, 4, 1, 2, 3)
            _, c, tg, _, _ = gradients.size()
            # target shape: [B, Tg, C', H', W']
            gradients = gradients.permute(0, 2, 1, 3, 4)
            activations = activations.permute(0, 2, 1, 3, 4)
EaaloZ commented 4 months ago

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!

            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)

I have modified line 126 of the "gradcam_utils.py" file and solved the problem by adding two lines of code. Maybe you can try this method. By the way, I found this method in this github repository https://github.com/MartinXM/TPS.

if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            # source shape: [B, C', Tg, H', W']
            gradients = gradients.permute(0, 4, 1, 2, 3)
            activations = activations.permute(0, 4, 1, 2, 3)
            _, c, tg, _, _ = gradients.size()
            # target shape: [B, Tg, C', H', W']
            gradients = gradients.permute(0, 2, 1, 3, 4)
            activations = activations.permute(0, 2, 1, 3, 4)

Thank you for your suggestions! When I set the target-layer-name to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1', the length of gradients is 3. So the permutation of 5 items failed. I have no idea what happened. Maybe I wrongly set the target-layer-name.

EaaloZ commented 4 months ago

ZechengLi19 I edited the gradcam_utils.py file (Line119) as you have suggested. The only differences lie on line 138 hand 140, where I leave out the first element (I guess this may be the cls_token) in gradients and activiations to ensure the reshape process. Here is the modifications:


gradients = self.target_gradients
activations = self.target_activations
if self.is_recognizer2d:
# [B*Tg, C', H', W']
b_tg, c, _, _ = gradients.size()
tg = b_tg // b
else:
grad = gradients.size()
# implement for transformer
if len(grad) == 3:
_, tg, c = grad
            ########### You can edit feature_h and feature_w to support your transformer
            feature_h = int(14)
            feature_w = int(14)
            ###########

            tg /= (feature_h * feature_w)
            tg = int(tg)
            gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
            gradients = gradients.permute(0, 1, 4, 2, 3)
            activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
            activations = activations.permute(0, 1, 4, 2, 3)
        elif len(grad) == 5:
            _, c, tg, _, _ = grad
            # target shape: [B, Tg, C', H', W']
            gradients = gradients.permute(0, 2, 1, 3, 4)
            activations = activations.permute(0, 2, 1, 3, 4)
        else:
            raise NotImplementedError("Please check grad shape")
ZechengLi19 commented 4 months ago

Ze ChengLi19 我按照您的建议编辑了 gradcam_utils.py 文件(第 119 行)。唯一的区别在于第 138 行和第 140 行,我在梯度和激活中省略了第一个元素(我猜这可能是 cls_token)以确保重塑过程。 以下是修改内容:

        gradients = self.target_gradients
        activations = self.target_activations
        if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)
            elif len(grad) == 5:
                _, c, tg, _, _ = grad
                # target shape: [B, Tg, C', H', W']
                gradients = gradients.permute(0, 2, 1, 3, 4)
                activations = activations.permute(0, 2, 1, 3, 4)
            else:
                raise NotImplementedError("Please check grad shape")

Does this mean the code is already working?

EaaloZ commented 4 months ago

Ze ChengLi19 我按照您的建议编辑了 gradcam_utils.py 文件(第 119 行)。唯一的区别在于第 138 行和第 140 行,我在梯度和激活中省略了第一个元素(我猜这可能是 cls_token)以确保重塑过程。 以下是修改内容:

        gradients = self.target_gradients
        activations = self.target_activations
        if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)
            elif len(grad) == 5:
                _, c, tg, _, _ = grad
                # target shape: [B, Tg, C', H', W']
                gradients = gradients.permute(0, 2, 1, 3, 4)
                activations = activations.permute(0, 2, 1, 3, 4)
            else:
                raise NotImplementedError("Please check grad shape")

Does this mean the code is already working?

Yes,But the results are meaningless. They show no signs of special region of the input video.