Open BaiMeiyingxue opened 10 months ago
Hi, I check the function, and I found it occurred at this function:
def predict2(inputs,input_ids, all_test_audio, input_mask, segment_ids):
logits, text_att, fusion_att, pooled_output_a, hidden_states, audio_attn, audio_weight, text_weight = model(
input_ids=input_ids, all_audio_data=all_test_audio, attention_mask=input_mask, token_type_ids=segment_ids,inputs_embeds=inputs)
return logits, text_att, fusion_att, pooled_output_a, hidden_states, audio_attn, audio_weight, text_weight
def custom_forward2(inputs,input_ids, all_test_audio, input_mask=None, segment_ids=None,):
logits, _, _, _, _, _, _, _ = predict2(inputs,input_ids, all_test_audio, input_mask, segment_ids)
# logits = logits.detach().cpu().numpy()
# logits = torch.argmax(logits, axis=-1)
return torch.softmax(logits, dim = 1)[0][0].unsqueeze(-1)
saved_layer, output = _forward_layer_distributed_eval(
forward_fn,
inputs,
layer,
target_ind=target_ind,
additional_forward_args=additional_forward_args,
attribute_to_layer_input=attribute_to_layer_input,
forward_hook_with_return=True,
require_layer_grads=True,
)
intermedia results in saved_layer is not zero , as follows:
the saved_layer in compute_layer_gradients_and_eval defaultdict(<class 'dict'>, {MAG(
(W_ha): Linear(in_features=256, out_features=128, bias=True)
(W_a): Linear(in_features=128, out_features=128, bias=True)
(LayerNorm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.5, inplace=False)
): {device(type='cuda', index=0): (tensor([[[ 78.7412, 73.8596, 73.8594, ..., 110.7228, 73.8589, 65.9760],
[ 79.4688, 73.8545, 73.8542, ..., 110.2769, 73.8539, 67.1542],
[ 82.7490, 73.8542, 73.8540, ..., 132.7583, 73.8536, 62.8921],
...,
[ 73.4833, 73.8537, 73.8534, ..., 76.7181, 73.8530, 73.1434],
[ 73.4835, 73.8539, 73.8536, ..., 76.7183, 73.8532, 73.1436],
[ 73.4835, 73.8539, 73.8537, ..., 76.7184, 73.8533, 73.1436]],
[[ 78.7414, 73.8593, 73.8592, ..., 110.7227, 73.8585, 65.9763],
[ 79.4685, 73.8537, 73.8536, ..., 110.2761, 73.8529, 67.1541],
[ 82.7489, 73.8536, 73.8536, ..., 132.7577, 73.8529, 62.8921],
...,
[ 73.4833, 73.8531, 73.8531, ..., 76.7178, 73.8523, 73.1435],
[ 73.4830, 73.8529, 73.8529, ..., 76.7176, 73.8521, 73.1432],
[ 73.4839, 73.8537, 73.8537, ..., 76.7185, 73.8529, 73.1440]],
[[ 78.7395, 73.8583, 73.8580, ..., 110.7207, 73.8575, 65.9752],
[ 79.4674, 73.8534, 73.8532, ..., 110.2752, 73.8526, 67.1537],
[ 82.7476, 73.8532, 73.8530, ..., 132.7564, 73.8524, 62.8916],
...,
[ 73.4819, 73.8526, 73.8523, ..., 76.7168, 73.8517, 73.1427],
[ 73.4820, 73.8526, 73.8524, ..., 76.7169, 73.8518, 73.1427],
[ 73.4826, 73.8533, 73.8531, ..., 76.7176, 73.8525, 73.1434]],
...,
[[ 81.5572, 128.5072, 112.7790, ..., 134.9548, 109.1424, 93.4030],
[138.1441, 272.8504, 229.1123, ..., 266.4277, 214.6286, 188.7362],
[126.5526, 229.1123, 226.2585, ..., 289.5139, 188.7424, 158.7619],
...,
[ 87.0023, 174.4372, 157.2018, ..., 146.8118, 160.7992, 158.1951],
[108.5435, 214.6286, 188.7424, ..., 167.3614, 192.2544, 182.7976],
[102.6403, 205.5971, 184.3762, ..., 166.0714, 184.3864, 182.7823]],
[[ 81.5615, 128.5496, 112.7817, ..., 135.0740, 109.1179, 93.4688],
[138.1872, 272.9772, 229.1659, ..., 266.7053, 214.6296, 188.9022],
[126.5548, 229.1659, 226.2448, ..., 289.7447, 188.6952, 158.8691],
...,
[ 87.0793, 174.6189, 157.3272, ..., 146.9802, 160.8787, 158.3711],
[108.5192, 214.6296, 188.6952, ..., 167.4441, 192.1724, 182.8588],
[102.7136, 205.7789, 184.5013, ..., 166.2566, 184.4481, 182.9819]],
[[ 81.5608, 128.5636, 112.7730, ..., 135.1378, 109.1116, 93.5008],
[138.1961, 273.0161, 229.1708, ..., 266.8467, 214.6322, 188.9795],
[126.5371, 229.1708, 226.2046, ..., 289.8467, 188.6646, 158.9085],
...,
[ 87.1190, 174.7090, 157.3798, ..., 147.0668, 160.9248, 158.4611],
[108.5132, 214.6322, 188.6646, ..., 167.4924, 192.1476, 182.8960],
[102.7509, 205.8658, 184.5499, ..., 166.3515, 184.4858, 183.0822]]],
device='cuda:0', grad_fn=<ReluBackward0>),)}})
and the output is like:
output in compute_layer_gradients_and_eval : tensor([0.0024], device='cuda:0', grad_fn=<UnsqueezeBackward0>
then the value pass into the function,
saved_grads = torch.autograd.grad(torch.unbind(output), grad_inputs)
the value of saved_grads is all zero. I have been stuck in the trouble about one week, could you give me some advice? @NarineK
❓ Questions and Help
We have a set of listed resources available on the website and FAQ: https://captum.ai/ and https://captum.ai/docs/faq . Feel free to open an issue here on the github or in our discussion forums:
', and return layer_attributions like below
the shape of layer attribution:tensor([[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]], device='cuda:0', grad_fn=<SumBackward1>)
what's it mean?