pytorch / captum

Model interpretability and understanding for PyTorch
https://captum.ai
BSD 3-Clause "New" or "Revised" License
4.91k stars 494 forks source link

Extension to graph model #246

Open tchaton opened 4 years ago

tchaton commented 4 years ago

Dear people of Captum,

I would like to add explanability within https://github.com/nicolas-chaulet/deeppointcloud-benchmarks How complex would it be to extend Captum to support at least Pytorch Geometric (https://github.com/rusty1s/pytorch_geometric)

Best, Thomas Chaton

NarineK commented 4 years ago

Hi @tchaton,

I've tried IG on the tutorial from pytorch_geometric and I got it working with a small trick. I modified original network to look like something like this:

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = GCNConv(dataset.num_node_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)

    def forward(self, x, edge_index):
        #x, edge_index = data.x, data.edge_index
        print(self.conv1(x, edge_index))
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)

        return F.log_softmax(x, dim=1)

Then I attributed to the dataset[0].x, I'm not sure if that's what you want to attribute to but it looks like all values in the x are 0. We might need to think of a different representation.

from captum.attr import IntegratedGradients

def custom_forward(x, edge_index):
    return model(x.squeeze(), edge_index[0])

ig = IntegratedGradients(custom_forward)
attr = ig.attribute(dataset[0].x, additional_forward_args=((dataset[0].edge_index.unsqueeze(0))), target=0)

Let me know if this helps. I'm not very familiar with Pytorch Geometric let me know if this makes sense to you.

tchaton commented 4 years ago

Hey @NarineK

dataset[0].x. In PyG, they use a Data wrapper which uses the python setattr function to put tensor into attributes as x for the features, y for labels, etc. So here, you are accessing the first indexed sample feature.

Ok sounds great ! So basically, as long as there is an forward and backward, even for custom modules, it should works.

The IntegreatedGradients should work, as it is going to call those functions.

Do you know if there are any methods which couldn't work properly due to custom modules ? I had this trouble with ResNet when I implemented gradients https://arxiv.org/abs/1611.06440. Residual Blocks were a bit complex to handle properly. I needed to create wrapper for each layer, and therefore couldn't be easily adapted to new layer.

Best, Thomas Chaton.

NarineK commented 4 years ago

Hi @tchaton, Yes, that's right as long as we have forward and backward even with custom modules it should work fine. It usually should work without problems. You can try to apply on those custom modules and let us know if you see any problems.

tchaton commented 4 years ago

Hey @Narine,

Last question: What would be your intuition about choosing a goof baseline for pointclouds for classification, segmentation ?

Next, we need to define simple input and baseline tensors. Baselines belong to the input space and often carry no predictive signal. Zero tensor can serve as a baseline for many tasks. Some interpretability algorithms such as Integrated Gradients, Deeplift and GradientShap are designed to attribute the change between the input and baseline to a predictive class or a value that the neural network outputs.

I could take random noise, or just points collapse in zero, or the centroid of a given class ?

NarineK commented 4 years ago

HI @tchaton, it depends on the dataset and task. There is also an option to have a distribution of baselines and sample from that distribution and average the results: GradientShap does something similar. Some papers have shown that the careful you choose the baseline the more meaningful your attribution will become.

Here they study the choice of baselines but there are also many other papers too: https://arxiv.org/pdf/1811.06471.pdf

CM-BF commented 4 years ago

Hi @NarineK, I guess it's a good idea to wrap the model to make those inputs acceptable form to IG, because it only cares about inputs here. But for a Layer Method such as GradCAM, it cannot work, because the inner layer's features don't have the proper shape the attribute method need. Thus, I have to inherit the GradCAM, then rewrite a part of the attribute method like this:

        layer_gradients, layer_evals, is_layer_tuple = compute_layer_gradients_and_eval(
            self.forward_func,
            self.layer,
            inputs,
            target,
            additional_forward_args,
            device_ids=self.device_ids,
            attribute_to_layer_input=attribute_to_layer_input,
        )
        undo_gradient_requirements(inputs, gradient_mask)

        # what I add
        layer_gradients = tuple(layer_grad.transpose(0, 1).unsqueeze(0)
                           for layer_grad in layer_gradients)

        layer_evals = tuple(layer_eval.transpose(0, 1).unsqueeze(0)
                       for layer_eval in layer_evals)
        # end

        summed_grads = tuple(
            torch.mean(
                layer_grad,
                dim=tuple(x for x in range(2, len(layer_grad.shape))),
                keepdim=True,
            )
            for layer_grad in layer_gradients
        )

I think it is not friendly to us if we always have to hack in your code and modify it for using it.

Saadman commented 4 years ago

Hi @NarineK, Is there any code and explanation specifically for graph convolutional networks which I can look at? The captum paper mentions that it can handle graph models but I couldn't find any examples for it. It will be really helpful to know how captum is used for such models. Thanks in advance!

joaquincabezas commented 3 years ago

Hi @Saadman,

have you seen the Google Colab by Amin (@m30m)? He is applying feature attribution to graphs with the Mutagenicity dataset from TUDatasets (from @chrsmrrs)

Check it out at: https://colab.research.google.com/drive/1fLJbFPz0yMCQg81DdCP5I8jXw9LoggKO?usp=sharing

It has been just added to the list of colabs of Pytorch Geometric: https://pytorch-geometric.readthedocs.io/en/latest/notes/colabs.html

Regards, Joaquín.

Saadman commented 3 years ago

Hello Joaquin,

This is fantastic! I'll look into it a bit more and see if I can use it with my example.

Thank you!

Best, Rashid

On Sun, Nov 29, 2020 at 3:41 PM Joaquin Cabezas notifications@github.com wrote:

Hi @Saadman https://github.com/Saadman,

have you seen the Google Colab by Amin (@m30m https://github.com/m30m)? He is applying feature attribution to graphs with the Mutagenicity dataset from TUDatasets (from @chrsmrrs https://github.com/chrsmrrs)

Check it out at: https://colab.research.google.com/drive/1fLJbFPz0yMCQg81DdCP5I8jXw9LoggKO?usp=sharing

It has been just added to the list of colabs of Pytorch Geometric: https://pytorch-geometric.readthedocs.io/en/latest/notes/colabs.html

Regards, Joaquín.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pytorch/captum/issues/246#issuecomment-735450808, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARWBR63HDNZK77E5OOFXHTSSKWWNANCNFSM4KJSGJPQ .

FarzanT commented 3 years ago

Hello @NarineK, thank you for your example above. However, I'm currently using the AttentiveFP model (https://github.com/pyg-team/pytorch_geometric/blob/master/examples/attentive_fp.py) from Pytorch Geometric, which in addition to data.x and data.edge_index, it takes data.edge_attr and data.batch as inputs. Furthermore, I have a multi-headed model, and the other head is basically a vanilla neural network. Since I really don't understand what you've done in your example above, I just tried to mimic your code, extending it to the inputs I described above:

def custom_forward(graph_data, other_data):
    return cur_model([graph_data.x.squeeze(),
                      graph_data.edge_index[0],
                      graph_data.edge_attr[0],
                      graph_data.batch[0]],
                     other_data)
zero_dl_attr_train, \
zero_dl_delta_train = interpret_method.attribute(graph_data,
                                                 additional_forward_args=(other_data),
                                                 target=0,
                                                 return_convergence_delta=True)

This still results in the following error:

AssertionError: `inputs` must have type torch.Tensor but <class 'torch_geometric.data.batch.Batch'> found:

Do you have any suggestions on what else should be modified for compatibility for at least the IntegratedGradients method?

Thank you!

FarzanT commented 3 years ago

Alright, I think I'm beginning to understand how to work with this, so far I have:

def custom_forward(graph_x, omic_data, graph_edge_index, graph_edge_attr, graph_smiles):
    batch = torch.zeros(graph_x.shape[0], dtype=int)
    cur_graph = MyGNNData(x=graph_x, edge_index=graph_edge_index[0],
                          edge_attr=graph_edge_attr[0], smiles=graph_smiles, batch=batch)
    cur_graph = GenFeatures()(cur_graph)

    return cur_model(cur_graph, [omic_data])

interpret_method = IntegratedGradients(custom_forward)

input_mask = torch.ones(cur_samples[1].x.shape[0],
                        cur_samples[1].x.shape[1]).requires_grad_(True)

zero_dl_attr_train, \
zero_dl_delta_train = interpret_method.attribute((input_mask, cur_samples[2][0]),
                                                 additional_forward_args=(cur_samples[1].edge_index.unsqueeze(0),
                                                                          cur_samples[1].edge_attr.unsqueeze(0),
                                                                          cur_samples[1].smiles[0]),
                                                 internal_batch_size=cur_samples[1].edge_index.shape[1],
                                                 return_convergence_delta=True)

My model expects an arbitrary number of inputs for the forward method, i.e.:

def forward(self, *inputs):

and in the example above, I'm passing a tuple of a graph and a list of tensors, cur_samples[1] and cur_samples[2], respectively. I'm replacing the graph with the input_mask, following the Pytorch Geometric tutorial mentioned above. Unfortunately, I'm getting this error now:

RuntimeError: The expanded size of the tensor (33) must match the existing size (66) at non-singleton dimension 0.  Target sizes: [33, 159].  Tensor sizes: [66, 1]

input_mask.shape is torch.Size([33, 37]), and my DataLoader's batch size is 1.

If I set internal_batch_size=1, I get the following error:

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Upon removing all graph data from the input, the function runs normally:

zero_dl_attr_train, \
zero_dl_delta_train = interpret_method.attribute(cur_samples[2][0],
                                                 additional_forward_args=(cur_samples[1].x,
                                                                          cur_samples[1].edge_attr,
                                                                          cur_samples[1].edge_index,
                                                                          cur_samples[1].smiles[0]),
                                                 internal_batch_size=1,
                                                 return_convergence_delta=True)

Obviously, the model depends on the GNN input (i.e. it is used in the forward pass), unlike the example in this issue: #303 Any idea why the attribute function doesn't think that GNN input data is used in the computation graph?

Thank you!

FarzanT commented 3 years ago

I now understand the problem, I should not be creating a new GNN in the custom_forward function:

    cur_graph = MyGNNData(x=graph_x, edge_index=graph_edge_index[0],
                          edge_attr=graph_edge_attr[0], smiles=graph_smiles, batch=batch)
    cur_graph = GenFeatures()(cur_graph)

This probably copies the input tensors, making it unattached to the computation graph, hence throwing the error above. Removing it from the custom_forward function fixes the issue. In my problem, this would be:

    def custom_forward(*inputs):
        # omic_data, graph_x, graph_edge_attr, graph_edge_index, omic_length):
        omic_length = inputs[-1]
        omic_data = inputs[0:omic_length]
        graph_x = inputs[-5]
        graph_edge_attr = inputs[-4]
        graph_edge_index = inputs[-3]
        batch = inputs[-2]

        return cur_model([graph_x, graph_edge_index, graph_edge_attr, batch], omic_data)

Cheers!

hanao2 commented 2 years ago

Hi @FarzanT,

Could you finally manage to explain your graph with Captum? I assume you were trying to perform a per-node feature importance task.

Thank you!

FarzanT commented 2 years ago

Hi @hanao2, yes, I was able to explain my graph, and measure the importance of each feature at each node and edge. I've updated my comment above to show the custom_forward function that I've used. Note that I'm using star expansion (using asterisk ) since my model can take an arbitrary number of inputs. You may not need this for your model. Since `inputs` is essentially a list, and since I know the positions of my list items, I subset and assign to specific variables. However, I have pass the number of arbitrary inputs so I know how to subset the list, i.e.:

        omic_length = inputs[-1]
        omic_data = inputs[0:omic_length]

Then I know the rest of the list elements are my graph variables:

        graph_x = inputs[-5]
        graph_edge_attr = inputs[-4]
        graph_edge_index = inputs[-3]
        batch = inputs[-2]

Hope this helps you with your project!

hanao2 commented 2 years ago

Hi @FarzanT, thank you for the answer. I perform graph-level prediction (regression) with no edge features. Here is how I get the attributes for each sample (graph):

ig = IntegratedGradients(model) #no model_forward functions
input = data.x # data includes a single graph/sample
attributions, approximation_error = ig.attribute(input,
                                                 additional_forward_args=(data.edge_index, data.batch),
                                                 internal_batch_size=1,
                                                 return_convergence_delta=True)    

Here is my question. How realistic were your results? did you find the importance per graph and average the scores (gradients) over all samples? The scores that I get for a single graph aren't quite interpretable. Any suggestions on that? Also, do you have any idea how I can get a per node feature importance? (one score per node)

Thank you!

FarzanT commented 2 years ago

@hanao2 My results were realistic in terms of getting some attribution score assigned to my nodes and edges. The graph in my problem represents molecules, but I can't tell whether the attributions correspond with real-world chemistry. I actually made sure to not average or sum all the attributions across different features. For example, if each edge has 10 features, I would like to know, for each edge, which of those features are more important. I suggest you don't average attributions either, otherwise you would lose information. I think the point of performing per-sample interpretation is to see what part of the graph is considered more important by the model. If you have a single importance score for a graph, then you can only do a dataset-level analysis where you compare different graphs together using that single score. Unfortunately, I don't know how to do that, but I suggest you pay more attention to node- and edge-level interpretation. I think it's more meaningful. The ig.attribute function that you have above should return attributions of the same size as your inputand additional_forward_args, so you have to dissect those and you'd get your feature-level attributions. Again, if you have multiple features per node, and you are only ok with a single score per node, then you have to come up with a way to combine the attributions for different features on each node. You can average them, but again, I'd advise against this.

Good luck!