GATConv only supports input x of dimensions 2

rahuldey91 commented 3 years ago

I am running a GNN network on a mesh. The inputs are of sizes BxNxC where B is the batch-size, N is the number of input nodes and C is the number of channels per node. This input works well with other kinds of conv layers like GCNConv and ChebConv, but it throws an error called 'Static graphs not supported in GATConv' in GATConv. Its forward code looks like this:

def forward(self, x: Union[Tensor, OptPairTensor], edge_index: Adj,
                size: Size = None, return_attention_weights=None):
        # type: (Union[Tensor, OptPairTensor], Tensor, Size, NoneType) -> Tensor  # noqa
        # type: (Union[Tensor, OptPairTensor], SparseTensor, Size, NoneType) -> Tensor  # noqa
        # type: (Union[Tensor, OptPairTensor], Tensor, Size, bool) -> Tuple[Tensor, Tuple[Tensor, Tensor]]  # noqa
        # type: (Union[Tensor, OptPairTensor], SparseTensor, Size, bool) -> Tuple[Tensor, SparseTensor]  # noqa
        r"""
        Args:
            return_attention_weights (bool, optional): If set to :obj:`True`,
                will additionally return the tuple
                :obj:`(edge_index, attention_weights)`, holding the computed
                attention weights for each edge. (default: :obj:`None`)
        """
        H, C = self.heads, self.out_channels

        x_l: OptTensor = None
        x_r: OptTensor = None
        alpha_l: OptTensor = None
        alpha_r: OptTensor = None
        if isinstance(x, Tensor):
            assert x.dim() == 2, 'Static graphs not supported in `GATConv`.'
            x_l = x_r = self.lin_l(x).view(-1, H, C)
            alpha_l = (x_l * self.att_l).sum(dim=-1)
            alpha_r = (x_r * self.att_r).sum(dim=-1)
        else:
            x_l, x_r = x[0], x[1]
            assert x[0].dim() == 2, 'Static graphs not supported in `GATConv`.'
            x_l = self.lin_l(x_l).view(-1, H, C)
            alpha_l = (x_l * self.att_l).sum(dim=-1)
            if x_r is not None:
                x_r = self.lin_r(x_r).view(-1, H, C)
                alpha_r = (x_r * self.att_r).sum(dim=-1)

        assert x_l is not None
        assert alpha_l is not None

        if self.add_self_loops:
            if isinstance(edge_index, Tensor):
                num_nodes = x_l.size(0)
                if x_r is not None:
                    num_nodes = min(num_nodes, x_r.size(0))
                if size is not None:
                    num_nodes = min(size[0], size[1])
                edge_index, _ = remove_self_loops(edge_index)
                edge_index, _ = add_self_loops(edge_index, num_nodes=num_nodes)
            elif isinstance(edge_index, SparseTensor):
                edge_index = set_diag(edge_index)

        # propagate_type: (x: OptPairTensor, alpha: OptPairTensor)
        out = self.propagate(edge_index, x=(x_l, x_r),
                             alpha=(alpha_l, alpha_r), size=size)

        alpha = self._alpha
        self._alpha = None

        if self.concat:
            out = out.view(-1, self.heads * self.out_channels)
        else:
            out = out.mean(dim=1)

        if self.bias is not None:
            out += self.bias

        if isinstance(return_attention_weights, bool):
            assert alpha is not None
            if isinstance(edge_index, Tensor):
                return out, (edge_index, alpha)
            elif isinstance(edge_index, SparseTensor):
                return out, edge_index.set_value(alpha, layout='coo')
        else:
            return out

So it seems like its expecting the input x to be of dimensionality two, which is not the case with my input. I have the same issue with GATv2Conv which solves the static graph issue of GATConv. So, does GATConv not support multiple graph inputs as a minibatch? Or is there something I am missing here? Please help.

zcaicaros commented 3 years ago

Can you show the test code?

I usually wrap the batch data as a torch_geometric.data.batch.Batch data type, and it supports any dimensionality of x.

rusty1s commented 3 years ago

As mentioned by @zcaicaros, GATConv supports mini-batch computation by wrapping each data object into a batch via torch_geometric.data.DataLoader. However, it does not support static graph computation yet (different feature matrices, single edge_index) without replicating edge_index via DataLoader. This is a current limitation as GATConv needs to learn an attention coefficient for each edge in the mini-batch.

rahuldey91 commented 3 years ago

Thanks, I was able to get this to work either by (i) wrapping all graphs using the Batch object, or (ii) by manually concatenating the batch-samples along the node-axis so as to formBNxC sized input while simultaneously combining all the B edge_indices as mentioned above (which is what Batch object also does). It would be great if this difference for GATConv is mentioned somewhere in the documentation, but I appreciate your help.

rusty1s commented 3 years ago

Good idea. I tried to introduce this information into our newly introduced GNN Cheatsheet table, see here.

rahuldey91 commented 3 years ago

Good idea. I tried to introduce this information into our newly introduced GNN Cheatsheet table, see here.

That sounds good. Thanks.

radandreicristian commented 2 years ago

@rahuldey91 what are the steps to reproduce the solution in (i)? What I did is:

Split the tensor along batch dim (separate the tensors into a list)
Created a Data object for each of them along with the (static) edge-index, and concatenated them in a list
Used Batch.from_data_list to create a batch
Called gat(x, edge_index).

The issue is that the check in the line 202 fails, it tries to go on the else, and it errors on line 206 (because obviously it's not a tuple of src/dst).

Did I miss something? Thanks.

rusty1s commented 2 years ago

Your procedure looks correct. How does your final Batch looks like that you input into GAT? Can you clarify why the check in line 202 should fail? If x is not a Tensor, what else is it in your case? :)

radandreicristian commented 2 years ago

I'm working on traffic data. The shape of a batch of input data is [batch_size, seq_len, n_nodes, d_hidden]

x = torch.randn((8, 12, 207, 16))

# 1200 random edges between 206 nodes, in sparse format.
edge_index = torch.randint(high=206, size=(2, 1200))

# This basically combines the batch and seq len dim into one - new shape is [batch_size * seq_len, n_nodes, d_hidden]
x = einops.rearrange(x, 'b l n f -> (b l) n f') 

# Convert from 3D tensor to a list of 2D tensors (each is a graph, shape [n_nodes, d_hidden]).
x = list(x) 

# Build a list of Data objects, each containing an item from the list above, and the (same) edge index
x = [Data(x=x_, edge_index=edge_index) for x_ in x] 

x = Batch.from_data_list(x)

layer = GATConv(in_channels=16, out_channels=16)

result = layer(x, edge_index=edge_index)

x is not a tensor, but a DataBatch, so it goes on the "else" in line 205.

Here's an approach that worked for me, in case anyone wants to accomplish something similar. Not sure if this is the most elegant way to accomplish the results, but it works.

x = torch.randn((8, 12, 207, 16))
edge_index = torch.randint(high=206, size=(2, 1200))
x = einops.rearrange(x, 'b l n f -> (b l) n f') 
layer = GATConv(in_channels=16, out_channels=16)
result = torch.stack([layer(graph, edge_index=edge_index) for graph in x], dim=0)

rusty1s commented 2 years ago

The x in layer needs to correspond to the node feature matrix of your data/batch object:

data_list = [Data(x=x_, edge_index=edge_index) for x_ in x] 
batch = Batch.from_data_list(data_list)
layer = GATConv(in_channels=16, out_channels=16)
result = layer(batch.x, edge_index=batch.edge_index)

radandreicristian commented 2 years ago

Makes sense. Thank you for pointing out the mistake!

LE: For anyone looking for performance, the torch.stack approach may be faster than the Batch, depending on the data.

I did some experiments with data of shape (384, N, 32) with N ranging from 100 to 500, with 10*N edges, and for N>100 the first approach was faster (by up to 2x).

LLE: This is very inefficient. If you just merge the other dimension(s) into the batch dimension apply GAT and then split the dimensions, you get a much faster result (~10x in my case).

YSLLYW commented 2 years ago

As mentioned by @zcaicaros, GATConv supports mini-batch computation by wrapping each data object into a batch via torch_geometric.data.DataLoader. However, it does not support static graph computation yet (different feature matrices, single edge_index) without replicating edge_index via DataLoader. This is a current limitation as GATConv needs to learn an attention coefficient for each edge in the mini-batch. How to solve the above problem? I don't understand how the comments say

nehSgnaiL commented 4 months ago

Makes sense. Thank you for pointing out the mistake!

LE: For anyone looking for performance, the torch.stack approach may be faster than the Batch, depending on the data.

I did some experiments with data of shape (384, N, 32) with N ranging from 100 to 500, with 10*N edges, and for N>100 the first approach was faster (by up to 2x).

LLE: This is very inefficient. If you just merge the other dimension(s) into the batch dimension apply GAT and then split the dimensions, you get a much faster result (~10x in my case).

In my practice, the (ii) way mentioned by @rahuldey91 in https://github.com/pyg-team/pytorch_geometric/issues/2844#issuecomment-878758615 would be faster :)

or (ii) by manually concatenating the batch-samples along the node-axis so as to form BNxC sized input while simultaneously combining all the B edge_indices

Boltzmachine commented 1 month ago

data_list = [Data(x=x_, edge_index=edge_index) for x_ in x] 
batch = Batch.from_data_list(data_list)
layer = GATConv(in_channels=16, out_channels=16)
result = layer(batch.x, edge_index=batch.edge_index)

This is inefficient because it involves a for loop, is there a more efficient way?

pyg-team / pytorch_geometric

GATConv only supports input x of dimensions 2 #2844