pytorch / captum

Model interpretability and understanding for PyTorch
https://captum.ai
BSD 3-Clause "New" or "Revised" License
4.94k stars 499 forks source link

Convergence of LayerIntegratedGradients #1413

Open pakaphholbig opened 3 weeks ago

pakaphholbig commented 3 weeks ago

I was following this demonstration to interpret my DistilBert-based model. I found that, in a few cases, IG does not converge even with a high n_steps value. However, if we swap the input and baseline, expecting a negative sign of the sum of IG across all dimensions with respect to the original, it converges.

Here is a part of my code:

lig = LayerIntegratedGradients(forward_pass,  model.distilbert.embeddings)

attributions_1, delta_1 = lig.attribute(inputs=(input_ids_1, token_type_ids_1, attention_mask_1),
                                  baselines=(ref_input_ids, ref_token_type_ids, ref_attention_mask),
                                  internal_batch_size= 15,
                                  n_steps = n_steps,
                                  return_convergence_delta=True)

attributions_ref, delta_ref = lig.attribute(inputs=(ref_input_ids, ref_token_type_ids, ref_attention_mask),
                                  baselines=(input_ids_1, token_type_ids_1, attention_mask_1),
                                  internal_batch_size= 15,
                                  n_steps = n_steps,
                                  return_convergence_delta=True)

The reference (baseline) input consists of only starting, ending, padding tokens.

I found that they converge to different values. Specifically, for n_steps = 300, delta_1 = -0.001 and delta_ref = 0.387. Even after increasing n_steps = 900, the deltas still remain the same. I would like to ask if there is an explanation for this.

*Note: my predict and forward_pass functions, analogous to the squad_pos_forward_func, are defined in the same way.