Gradients in Grad Rollout

I agree with @ilovecv

I check the address of the module using the following code and found that the order is reversed. Please let me know if you have any thoughts.

`def get_attention(self, module, input, output): print('f',id(module)) self.attentions.append(output.cpu())

def get_attention_gradient(self, module, grad_input, grad_output):
    print('b',id(module))
    self.attention_gradients.append(grad_input[0].cpu())`

and found that f 140200126500192 f 140200126548912 f 140200126499136 f 140200206491504 f 140200000463632 f 140200000464592 f 140200000465552 f 140200000466512 f 140200000172624 f 140200000173584 f 140200000174544 b 140200000174544 b 140200000173584 b 140200000172624 b 140200000466512 b 140200000465552 b 140200000464592 b 140200000463632 b 140200206491504 b 140200126499136 b 140200126548912 b 140200126500192 b 140200126500144

jacobgil / vit-explain

Gradients in Grad Rollout #6