Open YNawal opened 3 years ago
@kennymckormick Could you please see my problem
GradCAM hasn't supported the transformer-based models yet. @irvingzhang0512 do you have some time to look at this problem?
I don't know if reshaping gradient could make sense.
GradCAM hasn't supported the transformer-based models yet. @irvingzhang0512 do you have some time to look at this problem?
I'll take a look in August
I am also facing same issue with kinetics dataset for training
Does GradCAM in mmaction2 support transformer-based models?
I edit gradcam_utils.py file (Line119) by this way to support the transformer-based models.
gradients = self.target_gradients
activations = self.target_activations
if self.is_recognizer2d:
# [B*Tg, C', H', W']
b_tg, c, _, _ = gradients.size()
tg = b_tg // b
else:
grad = gradients.size()
# implement for transformer
if len(grad) == 3:
_, tg, c = grad
########### You can edit feature_h and feature_w to support your transformer
feature_h = int(14)
feature_w = int(14)
###########
tg /= (feature_h*feature_w)
tg = int(tg)
gradients = gradients.reshape(-1,tg,feature_h,feature_w,c)
gradients = gradients.permute(0,1,4,2,3)
activations = activations.reshape(-1,tg,feature_h,feature_w,c)
activations = activations.permute(0,1,4,2,3)
elif len(grad) == 5:
_, c, tg, _, _ = grad
# target shape: [B, Tg, C', H', W']
gradients = gradients.permute(0, 2, 1, 3, 4)
activations = activations.permute(0, 2, 1, 3, 4)
else:
raise NotImplementedError("Please check grad shape")
@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!
grad = gradients.size()
# implement for transformer
if len(grad) == 3:
_, tg, c = grad
########### You can edit feature_h and feature_w to support your transformer
feature_h = int(14)
feature_w = int(14)
###########
tg /= (feature_h * feature_w)
tg = int(tg)
gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
gradients = gradients.permute(0, 1, 4, 2, 3)
activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
activations = activations.permute(0, 1, 4, 2, 3)
@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!
grad = gradients.size() # implement for transformer if len(grad) == 3: _, tg, c = grad ########### You can edit feature_h and feature_w to support your transformer feature_h = int(14) feature_w = int(14) ########### tg /= (feature_h * feature_w) tg = int(tg) gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) gradients = gradients.permute(0, 1, 4, 2, 3) activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) activations = activations.permute(0, 1, 4, 2, 3)
Maybe you can share your gradcam_utils.py file to me, that I can help you find the bug.
@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!
grad = gradients.size() # implement for transformer if len(grad) == 3: _, tg, c = grad ########### You can edit feature_h and feature_w to support your transformer feature_h = int(14) feature_w = int(14) ########### tg /= (feature_h * feature_w) tg = int(tg) gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) gradients = gradients.permute(0, 1, 4, 2, 3) activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) activations = activations.permute(0, 1, 4, 2, 3)
I have modified line 126 of the "gradcam_utils.py" file and solved the problem by adding two lines of code. Maybe you can try this method. By the way, I found this method in this github repository https://github.com/MartinXM/TPS.
if self.is_recognizer2d:
# [B*Tg, C', H', W']
b_tg, c, _, _ = gradients.size()
tg = b_tg // b
else:
# source shape: [B, C', Tg, H', W']
gradients = gradients.permute(0, 4, 1, 2, 3)
activations = activations.permute(0, 4, 1, 2, 3)
_, c, tg, _, _ = gradients.size()
# target shape: [B, Tg, C', H', W']
gradients = gradients.permute(0, 2, 1, 3, 4)
activations = activations.permute(0, 2, 1, 3, 4)
@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!
grad = gradients.size() # implement for transformer if len(grad) == 3: _, tg, c = grad ########### You can edit feature_h and feature_w to support your transformer feature_h = int(14) feature_w = int(14) ########### tg /= (feature_h * feature_w) tg = int(tg) gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) gradients = gradients.permute(0, 1, 4, 2, 3) activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) activations = activations.permute(0, 1, 4, 2, 3)
I have modified line 126 of the "gradcam_utils.py" file and solved the problem by adding two lines of code. Maybe you can try this method. By the way, I found this method in this github repository https://github.com/MartinXM/TPS.
if self.is_recognizer2d: # [B*Tg, C', H', W'] b_tg, c, _, _ = gradients.size() tg = b_tg // b else: # source shape: [B, C', Tg, H', W'] gradients = gradients.permute(0, 4, 1, 2, 3) activations = activations.permute(0, 4, 1, 2, 3) _, c, tg, _, _ = gradients.size() # target shape: [B, Tg, C', H', W'] gradients = gradients.permute(0, 2, 1, 3, 4) activations = activations.permute(0, 2, 1, 3, 4)
Thank you for your suggestions! When I set the target-layer-name to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1', the length of gradients is 3. So the permutation of 5 items failed. I have no idea what happened. Maybe I wrongly set the target-layer-name.
ZechengLi19 I edited the gradcam_utils.py file (Line119) as you have suggested. The only differences lie on line 138 hand 140, where I leave out the first element (I guess this may be the cls_token) in gradients and activiations to ensure the reshape process. Here is the modifications:
gradients = self.target_gradients activations = self.target_activations if self.is_recognizer2d: # [B*Tg, C', H', W'] b_tg, c, _, _ = gradients.size() tg = b_tg // b else: grad = gradients.size() # implement for transformer if len(grad) == 3: _, tg, c = grad
########### You can edit feature_h and feature_w to support your transformer
feature_h = int(14)
feature_w = int(14)
###########
tg /= (feature_h * feature_w)
tg = int(tg)
gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
gradients = gradients.permute(0, 1, 4, 2, 3)
activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
activations = activations.permute(0, 1, 4, 2, 3)
elif len(grad) == 5:
_, c, tg, _, _ = grad
# target shape: [B, Tg, C', H', W']
gradients = gradients.permute(0, 2, 1, 3, 4)
activations = activations.permute(0, 2, 1, 3, 4)
else:
raise NotImplementedError("Please check grad shape")
Ze ChengLi19 我按照您的建议编辑了 gradcam_utils.py 文件(第 119 行)。唯一的区别在于第 138 行和第 140 行,我在梯度和激活中省略了第一个元素(我猜这可能是 cls_token)以确保重塑过程。 以下是修改内容:
gradients = self.target_gradients activations = self.target_activations if self.is_recognizer2d: # [B*Tg, C', H', W'] b_tg, c, _, _ = gradients.size() tg = b_tg // b else: grad = gradients.size() # implement for transformer if len(grad) == 3: _, tg, c = grad ########### You can edit feature_h and feature_w to support your transformer feature_h = int(14) feature_w = int(14) ########### tg /= (feature_h * feature_w) tg = int(tg) gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) gradients = gradients.permute(0, 1, 4, 2, 3) activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) activations = activations.permute(0, 1, 4, 2, 3) elif len(grad) == 5: _, c, tg, _, _ = grad # target shape: [B, Tg, C', H', W'] gradients = gradients.permute(0, 2, 1, 3, 4) activations = activations.permute(0, 2, 1, 3, 4) else: raise NotImplementedError("Please check grad shape")
Does this mean the code is already working?
Ze ChengLi19 我按照您的建议编辑了 gradcam_utils.py 文件(第 119 行)。唯一的区别在于第 138 行和第 140 行,我在梯度和激活中省略了第一个元素(我猜这可能是 cls_token)以确保重塑过程。 以下是修改内容:
gradients = self.target_gradients activations = self.target_activations if self.is_recognizer2d: # [B*Tg, C', H', W'] b_tg, c, _, _ = gradients.size() tg = b_tg // b else: grad = gradients.size() # implement for transformer if len(grad) == 3: _, tg, c = grad ########### You can edit feature_h and feature_w to support your transformer feature_h = int(14) feature_w = int(14) ########### tg /= (feature_h * feature_w) tg = int(tg) gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) gradients = gradients.permute(0, 1, 4, 2, 3) activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) activations = activations.permute(0, 1, 4, 2, 3) elif len(grad) == 5: _, c, tg, _, _ = grad # target shape: [B, Tg, C', H', W'] gradients = gradients.permute(0, 2, 1, 3, 4) activations = activations.permute(0, 2, 1, 3, 4) else: raise NotImplementedError("Please check grad shape")
Does this mean the code is already working?
Yes,But the results are meaningless. They show no signs of special region of the input video.
Hello I'm trying to visualize the grad cam of my timeSformer model I tried differents target layer name but without success The error is either : AttributeError: 'NoneType' object has no attribute 'size' or , c, tg, , _ = gradients.size() ValueError: not enough values to unpack (expected 5, got 3)
I'm using rawframes datset. Thanks for you help