pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.11k stars 3.63k forks source link

About batch_size #429

Closed shuowang-ai closed 5 years ago

shuowang-ai commented 5 years ago

❓ Questions & Help

Hi, I want to know how to use batch, if the shape of x in def forward(self, x, edge_index) is [batch_size, node_num, input_feature]. the shape of x is [node_num, input_feature] in cora.py? How to use batch_size?

Thank you!

rusty1s commented 5 years ago

We support mini-batches by concatenating node features in the node dimension and stacking adjacency matrices diagonally (see here). So you can either use the PyG provided DataLoader or manually convert your data to this kind of format. In case you want to use the same edge_index for all your batched node features, this issue might be of interest to you.

shuowang-ai commented 5 years ago

Those are useful information, but I am doing the graph generation problem. So the edge_index and x vary in each epoch. I can't use DataLoader to pre-define the mini-batches. Thanks to your MetaLayer, I implemented a batch-based MetaLayer which does the edges' and nodes' information exchanging in every batch-dim with some trick.

class MetaLayer(nn.Module):
    def __init__(self, n_in, n_out, e_h, n_h):
        super(MetaLayer, self).__init__()
        self.edge_mlp = Sequential(Linear(2 * n_in, e_h), ReLU())
        self.node_mlp_1 = Sequential(Linear(e_h + n_in, n_h), ReLU())
        self.node_mlp_2 = Sequential(Linear(n_h, n_out), ReLU())

    def _n2e_e2e(self, src, dest):
        out = torch.cat([src, dest], -1)
        out = self.edge_mlp(out)
        return out

    def _e2n_n2n(self, x, edge_index, edge_attr, edge_value):
        src, dest = edge_index
        out = torch.cat([x[:,dest], edge_attr], dim=-1)
        out = self.node_mlp_1(out)
        input = out * edge_value[None,:,None]
        out = scatter_add(input, dest, dim=1, dim_size=x.size(1))
        out = self.node_mlp_2(out)
        return out

    def forward(self, x, edge_index, edge_value):
        edge_src, edge_dest = edge_index
        edge_attr = self._n2e_e2e(x[:,edge_src], x[:,edge_dest])
        x = self._e2n_n2n(x, edge_index, edge_attr, edge_value)
        return x

the shape of x is [batch_size, node_num, input_feature]. From the experimental results, It works well. But I want to make it compatible with your MassagePassing framework, then I can using the model in PyG, like GAT, GraphSAGE, etc. I considering several solutions, such as converting the x [batch_size, node_num, input_feature] to [node_num, input_feature], the batch_size is analogy to sub-graph which will be used to construct a big graph (mini-batch) along the diagonal, just as the method mentioned here. But I don't know whether it's derivable? And how do people use PyG to do graph generation problem? Thank you!

rusty1s commented 5 years ago

I would go with your proposed solution, something like:

edge_index = edge_index.view(2, 1, -1).repeat(1, x.size(0), 1) + torch.arange(x.size(0)).view(1, -1, 1) * x.size(1)
edge_index = edge_index.view(2, -1)
x = x.view(-1, num_features)
x = ... # use any PyG operator now
x = x.view(batch_size, num_nodes, num_features)

Concerning graph generation: Graph generation is a difficult problem and is mostly tackled by using dense adjacency matrices where edge weights denote the probability of an edge. You can do this for sparse adjacency matrices too, where gradients only pass through edge_value, not edge_index.

shuowang-ai commented 5 years ago

Cool! I will try it! Thank you!

shuowang-ai commented 5 years ago

I would go with your proposed solution, something like:

edge_index = edge_index.view(2, 1, -1).repeat(1, x.size(0), 1) + torch.arange(x.size(0)).view(1, -1, 1) * x.size(1)
edge_index = edge_index.view(2, -1)
x = x.view(-1, num_features)
x = ... # use any PyG operator now
x = x.view(batch_size, num_nodes, num_features)

Concerning graph generation: Graph generation is a difficult problem and is mostly tackled by using dense adjacency matrices where edge weights denote the probability of an edge. You can do this for sparse adjacency matrices too, where gradients only pass through edge_value, not edge_index.

What do you think about how to use edge_value in PyG operator?

rusty1s commented 5 years ago

Can you elaborate?

shuowang-ai commented 5 years ago

As you know, edge_value[1,E] has the same length like edge_index[2, E], E is the edge number, edge_value holds the connection probability. the generated adjacency like:

array([[0.6710922 , 0.88077555, 0.76953579, 0.74592817, 0.38684827],
       [0.67407643, 0.05540686, 0.07848255, 0.79545276, 0.58294445],
       [0.60898092, 0.55322303, 0.93865896, 0.0118191 , 0.55793872],
       [0.15374984, 0.92119851, 0.93366929, 0.2216791 , 0.03274837],
       [0.15263654, 0.15071275, 0.6112376 , 0.59826883, 0.80632444]])

So, the element in edge_value is its corresponding connection probability in adjacency.

Then I want to do this operation. let the node feature multiply the connection probability.

input = x * edge_value[None,:,None]
x = scatter_add(input, dest, dim=1, dim_size=x.size(1))

Another question, as you mentioned above, this operation can make the indices does not mix up. but according to my experiments, the edge_value is actually the gradients' source, not edge_index? from what I understand, the most examples the PyG provided do not care about whether the edge_index can be derivable or not, because in node classification and link prediction, edge_index is the data source, not a intermediate step.

edge_index = edge_index.view(2, 1, -1).repeat(1, x.size(0), 1) + torch.arange(x.size(0)).view(1, -1, 1) * x.size(1)

https://arxiv.org/abs/1812.11482v2 is our previous work, and I am working on enlarge the graph. I want to replace the dynamics learner with the GNN based on MessagePassing. the gradient should be able to pass through the network converter (dense [batch_size, node_num, input_feature] to sparse [batch_size * node_num, input_feature] )

rusty1s commented 5 years ago

Yes, gradients will we propagated through edge_value, not edge_index. So to convert your batch-wise scenario to PyG, you need to repeat edge_value batch_size times:

edge_value = edge_value.view(-1, 1).repeat(batch_size, 1).view(-1)
shuowang-ai commented 5 years ago

Can it be simply merged into x by x * edge_value after repeat before feed in GCN forward?

Achieve this effect

input = x * edge_value[None,:,None] x = scatter_add(input, dest, dim=1, dim_size=x.size(1))

rusty1s commented 5 years ago

No, those tensors have different shape.

shuowang-ai commented 5 years ago

Yes, I see. But how do x coalesce edge value?

rusty1s commented 5 years ago

In the message function of the PyG operators. For GCNConv, simply pass in the edge_value to the forward call: conv(x, edge_index, edge_value)

shuowang-ai commented 5 years ago

Wow, I will check it closely. Do GAT and GrapSAGE have the same operation?

rusty1s commented 5 years ago

Sadly no, these operators are only defined for unweighted graphs. I suggest copy pasting those operators and integrating the edge_weight by yourself (in analogy to GCNConv).

shuowang-ai commented 5 years ago

I get it. I will try. Thank you sincerely!