A question about captum in graph classification tasks

juanshu30 commented 2 years ago

🐛 Describe the bug

I am running a graph classification task and when I finish training the model, I run:

captum_model = to_captum(model, mask_type='edge')
edge_mask = torch.ones(data.num_edges, requires_grad=True, device=device)

ig = IntegratedGradients(captum_model)
ig_attr_edge = ig.attribute(edge_mask.unsqueeze(0), target=data.y, additional_forward_args=(data),internal_batch_size=64)

data is a batch in my train_loader. I got the error: "Dimension 0 of input should be 1". But the edge_mask.unsqueeze(0) is of dimension (1,num_edges). Could you help me with this problem? Thanks!

Environment

PyG version: 1.10
PyTorch version:2.03
OS:
Python version: 3.8
CUDA/cuDNN version:113
How you installed PyTorch and PyG (conda, pip, source): conda
Any other relevant information (e.g., version of torch-scatter):

rusty1s commented 2 years ago

This is just a guess, but I think you need to either specify output_idx = 0 or alternatively input the target as a tensor of shape [1, 1]. @RBendias

RBendias commented 2 years ago

Currently, you can't use to_captum for graph classification batch-wise. Does the method work if you select only one graph and specify the internal_batch_size = 1? Setting output_idx = 0 should not be necessary.

Also set additional_forward_args = (data.x, data.edge_index).

juanshu30 commented 2 years ago

Currently, you can't use to_captum for graph classification batch-wise. Does the method work if you select only one graph and specify the internal_batch_size = 1? Setting output_idx = 0 should not be necessary.

Also set additional_forward_args = (data.x, data.edge_index).

Thanks, I tried the example: https://colab.research.google.com/drive/1fLJbFPz0yMCQg81DdCP5I8jXw9LoggKO?usp=sharing#scrollTo=9Hh3YNASuYxm

It generates explanation for one selected graph.

bgeier commented 2 years ago

I'm trying to use this method to explain graph predictions but I'm more interested in what nodes are important. Is mask_type="node" expected to work for graph classification? when running this snippet adapted from the tutorial

captum_model = to_captum(model, mask_type="node")

ig = IntegratedGradients(captum_model)
ig_attr = ig.attribute(
    data.x.unsqueeze(0),
    target=int(data.y),
    additional_forward_args=(data.edge_index, data.edge_attr, data.batch),
    internal_batch_size=1,
)

I get the following error RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior

RBendias commented 2 years ago

Yes, to_captum should work for graph classification.

This error seems to occur when the input (in your case data.x) is not used in the forward pass, as described in this issue thread from the Captum repo: https://github.com/pytorch/captum/issues/303. What does the forward function of your model look like?

bgeier commented 2 years ago

data.x lists three integers that serve as indices into an embedding layer. One of the embeddings is pre-trained and the other two are learned. I ultimately pool those embeddings with a dense layer before calling graph layers. Here's the forward shown with args, and parsing x into the inputs that get passed to an embedding layer. Is this ok?

Thanks for your help

def forward(self, x, edge_index, edge_attr, batch):

        edge_attr = edge_attr[:, 0].flatten().float()

        node_x = x[:, 0].int()

        ops_x = x[:, 1].int()
        dt_x = x[:, 2].int()

RBendias commented 2 years ago

I am not 100% sure, but the problem could be the transformation to integers. In this case, requires_grad is set to false. For comparing the node importance, you could create a new model with the trained graph layers, calculate the embeddings in advance and pass them as inputs. Let me know if this works.

class NewModel():
    def __init__(self, model):
        self.conv_1 = model.conv_1
        ...
    def forward(embeddings, edge_index):
        x = self.conv_1(embeddings, edge_index)
        ...

captum_model = to_captum(NewModel, mask_type="node")

ig.attribute(embeddings.unsqueeze(0), ...)

bgeier commented 2 years ago

You're right! Thanks. I had seen this issue with NLP in captum but didn't make the connection. To make this work I created a module to return the embeddings using a forward hook and subclassed my model to rewrite the forward function while keeping my init unchanged. For embedding extraction I found this helpful example in a forum

from typing import Dict, Iterable, Callable
from torch import nn 
from torch import Tensor

# create a class to return embeddings from trained embedding layers using a forward hook
# return s a dictionary of layer:tensor
class FeatureExtractor(nn.Module):
    def __init__(self, model: nn.Module, layers: Iterable[str]):
        super().__init__()
        self.model = model
        self.layers = layers
        self._features = {layer: torch.empty(0) for layer in layers}

        for layer_id in layers:
            layer = dict([*self.model.named_modules()])[layer_id]
            layer.register_forward_hook(self.save_outputs_hook(layer_id))

    def save_outputs_hook(self, layer_id: str) -> Callable:
        def fn(_, __, output):
            self._features[layer_id] = output
        return fn

    def forward(self, x: Tensor, edge_index: Tensor, edge_attr: Tensor, batch: Tensor) -> Dict[str, Tensor]:
        _ = self.model(x, edge_index, edge_attr, batch)
        return self._features

which allowed me to extract the trained embeddings via

embedder = FeatureExtractor(model=model, layers=['ex','ey','ez']) 
emb_dict = embedder(data.x, data.edge_index, data.edge_attr, data.batch)

now I just had to load the weights shared between the reduced subclassed model and full model

# define a new initialized model for just GNN portion, or post integer embedding 
gnn = newModel(emb_x, **params) # this is a subclassed model with a forward that starts with embedding cat
# get pretrained weights 
pretrained_dict = checkpoint['model_state_dict'] 
# find dict of subclasses instance 
model_dict = gnn.state_dict() 
# 1. filter out unnecessary keys 
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict} 
# 2. overwrite entries in the existing state dict 
model_dict.update(pretrained_dict) 
# 3. load the new state dict 
gnn.load_state_dict(model_dict)

captum_model = pyg.nn.models.explainer.to_captum(gnn, mask_type='node')
emb_dict = embedder(data.x, data.edge_index, data.edge_attr, data.batch)
x = torch.cat((emb_dict['ex'],emb_dict['ey'],emb_dict['ez']), dim=1)

ig = IntegratedGradients(captum_model) 
ig_attr = ig.attribute( x.unsqueeze(0), target=int(data.y), additional_forward_args=(data.edge_index, data.edge_attr, data.batch), internal_batch_size=1, ) 
node_mask = np.abs(ig_attr[0].cpu().detach().numpy())

Thanks for your help!

fratajcz commented 2 years ago

I have another question which goes in the same direction. It seems to me that we need to feed this mask of ones as edge weights into the forward pass so that the "importance" of an edge is mapped to this mask via the gradients?

If I am correct on that one, how would one go about explaining the edges in convolutions that do not use the edge_weight keyword, like e.g. FiLMConv?

If I am not correct, then why do we feed a mask of only ones into the explanatin method?

Thanks!

rusty1s commented 2 years ago

Any GNN layer should work for this. The edge_mask of ones is refined within the MessagePassing module, i.e., each message if weighted by it, see here.

fratajcz commented 2 years ago

I see, thanks for the explanation! I think the explanation script is really helpful and adds a lot to the repository.

I have clicked through the explanation example and I see that there are no explanations for self-loops, even if self-loops are on by default in GCNs. Is there any way how we can "force" explanations for self-loops, like explicitly feeding them in via edge_index?

I can see that there are examples like completely random edges, where the GNN will regress to something like an MLP by almost only exploiting self-loops and ignoring the (uninformative) edges completely. Using an explanation method as outlined in the example will give the false impression that some edges are informative, just because we scale them between 0 and 1, while actually only the self-loops should be informative.

rusty1s commented 2 years ago

Yes, you should be able to get explanations for self-loops if they are pre-defined. We do not generate them when we generate them on-the-fly. The sigmoid on the edge_mask still can map all edge scores to 0 though.

fratajcz commented 2 years ago

Thanks again a ton! Does this mean in this case that I have to already train the model with explicit self-loops in place or can I just put them in for the explanation?

On another note, maybe it helps someone: I was also getting the RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior Error, which was because I had packaged my GNN in three Sequentials and only one was using edge_index. My forward method looked like this:

def forward(self, x, edge_index):
    x = self.pre_message_passing(x)                 # some node-level feature extraction
    x = self.message_passing(x, edge_index)   # the actual GNN 
    x = self.post_message_passing(x)               # the node-level classfication network
    return x

Packaging it all into one continuous torch_geometric.nn.Sequential worked without errors!

pyg-team / pytorch_geometric

A question about captum in graph classification tasks #4315

🐛 Describe the bug

Environment