Issue

Essentially, on every run of .attribute() and .visualize_image_attr() for GradientShap, a completely different attribution is produced.
Unlike DeepLift/DeepLiftShap for which the attributions are stable and don't change.
Note: Baselines, inputs, targets were shared between all attribution methods.

Description of case below:

# Defined baselines below
rand_img_dist = torch.cat([torch.zeros((math.floor(batch_size/2.0),1,224,224)),torch.ones((math.ceil(batch_size/2.0),1,224,224))]).contiguous().cuda()

# Defined methods. net is defined previously
dl = DeepLift(net)
dlshap = DeepLiftShap(net)
gradshap = GradientShap(net)

# Generated attributions
dl_attrs = dl.attribute(batch, target = 0)
dlshap_attrs= dlshap.attribute(batch,rand_img_dist, target = 0)
gradshap_attrs = gradshap.attribute(batch,rand_img_dist, target = 0)

# Function to visualize attributions
def viz_saliency(attrs, cfg):
        plt_fig, _ = viz.visualize_image_attr(attr = np.transpose(attrs.squeeze().unsqueeze(0).cpu().detach().numpy(), (1,2,0)), 
                                            method = "heat_map", 
                                            sign = "absolute_value", 
                                            cmap='CMRmap', 
                                            show_colorbar=False,
                                            use_pyplot = True)

Now, Gradshap generates different attributions on every execution of:

viz_saliency(gradshap_attrs[0], cfg)

Unlike DeepLift and DeepLiftShap which produce the same attributions on every run of:

dlshap_attrs= dlshap.attribute(batch,rand_img_dist, target = 0)
viz_saliency(dl_attrs[0], cfg)
viz_saliency(dlshap_attrs[0], cfg)

pytorch / captum

Every run produces a different attribution (GradientShap) #742

Issue

Description of case below: