pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.15k stars 3.64k forks source link

Questions about NeighborSampler #3373

Closed MrShouxingMa closed 2 years ago

MrShouxingMa commented 2 years ago

❓ Questions & Help

Hi @rusty1s , I have two questions about NeighborSampler. I need to perform graph convolutional neural networks on a large graph. I used the NeighborSampler method you recommended earlier. The NeighborSampler method completes the graph convolution of the large graph by dividing into small graphs. Question 1: Now, I want to implement standardized GCNs through the NeighborSampler method, but when I try to add the weight coefficient of the GCNs, it does not change after the loss is backpropagated. I also tried to replace the GCN weight parameter by adding a linear layer or modify the position of linear transformation (for example, before or after using the NeighborSampler method), but it doesn't work! After I modify, I print the variable: Parameter containing: tensor([[0.3245]], device='cuda:0', requires_grad=True) Although it shows that the gradient can be backpropagated, it does not do so. I don't know how to make GCN parameters back-propagating gradients. Question 2: There is only one trainable parameter matrix in GCNs. I want to change the adjacency matrix into a trainable parameter matrix at the same time (because Kipf's paper mentions that it can be binary or weighted). Besides the method of using GAT, is there other methods? Many thanks in advance and thanks for sharing work! Best regards!

rusty1s commented 2 years ago

I'm not fully sure I understand the issues. Do you try to add an additional edge_weight argument to GCNConv that is learnable? I don't think this is easily possible in a NeighborSampler scenario, as the adjacency matrices will change across GNN layer computation.

MrShouxingMa commented 2 years ago

Let's discuss the first question at first. After I add the trainable weight matrix weights, the codes are as follows:

class Sub_Base_gcn(torch.nn.Module):
    def __init__(self, in_channels, out_channels, normalize=True, bias=True, aggr='add', **kwargs):
        super(Sub_Base_lightgcn, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.convs = BaseModel(in_channels, out_channels)
        self.weight = Parameter(torch.Tensor(in_channels, out_channels))
        self.reset_parameters()

    def reset_parameters(self):
        uniform(self.in_channels, self.weight)

    def forward(self, all_x, all_edge_index, all_edge_weight, deg_norm=True):
        if deg_norm:
            row, col = all_edge_index
            deg = degree(row, all_x.size(0), dtype=all_x.dtype).cuda()
            deg_inv_sqrt = deg.pow(-0.5)
            deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
            norm_edge_weight = deg_inv_sqrt[row] * all_edge_weight.cuda() * deg_inv_sqrt[col]
        else:
            norm_edge_weight = all_edge_weight.cuda()

        subgraph_loader = NeighborSampler(all_edge_index, node_idx=None, sizes=[-1], num_nodes=all_x.size(0),
                                          batch_size=60000, shuffle=False, num_workers=2, drop_last=False)
        xs = []
        for batch_size, n_id, adj in subgraph_loader:
            edge_index, _, size = adj
            x = all_x[n_id].cuda()
            edge_weight = norm_edge_weight[adj.e_id].cuda()
            x_target = x[:size[1]]
            x = torch.matmul(x, self.weight)
            print(self.weight)
            x = self.convs((x, x_target), edge_index.cuda(), edge_weight)
            xs.append(x.cpu())
        x_all = torch.cat(xs, dim=0)

        return x_all.cuda()

or

class Sub_Base_gcn(torch.nn.Module):
    def __init__(self, in_channels, out_channels, normalize=True, bias=True, aggr='add', **kwargs):
        super(Sub_Base_lightgcn, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.lin = Linear(in_channels, out_channels, bias=False)
        nn.init.xavier_normal_(self.lin.weight)
        self.convs = BaseModel(in_channels, out_channels)

    def reset_parameters(self):
        uniform(self.in_channels, self.weight)

    def forward(self, all_x, all_edge_index, all_edge_weight, deg_norm=True):
        if deg_norm:
            row, col = all_edge_index
            deg = degree(row, all_x.size(0), dtype=all_x.dtype).cuda()
            deg_inv_sqrt = deg.pow(-0.5)
            deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
            norm_edge_weight = deg_inv_sqrt[row] * all_edge_weight.cuda() * deg_inv_sqrt[col]
        else:
            norm_edge_weight = all_edge_weight.cuda()
        all_x = self.lin(all_x)
        print(self.lin.weight)

        subgraph_loader = NeighborSampler(all_edge_index, node_idx=None, sizes=[-1], num_nodes=all_x.size(0),
                                          batch_size=60000, shuffle=False, num_workers=2, drop_last=False)
        xs = []
        for batch_size, n_id, adj in subgraph_loader:
            edge_index, _, size = adj
            x = all_x[n_id].cuda()
            edge_weight = norm_edge_weight[adj.e_id].cuda()
            x_target = x[:size[1]]
            x = self.convs((x, x_target), edge_index.cuda(), edge_weight)
            xs.append(x.cpu())
        x_all = torch.cat(xs, dim=0)

        return x_all.cuda()

The trainable weight matrix for Neither of them does not change !

rusty1s commented 2 years ago

This looks mostly correct, although you never optimize your parameter via torch.optim.Adam. Might this be the reason?

MrShouxingMa commented 2 years ago

Thank you for your prompt reply! Because my model is so large, this is only a part of my model, and I actually use torch.optim.Adam to optimize my parameter. I am trying to solve this problem by showing you the module independently.

MrShouxingMa commented 2 years ago

The first problem has been solved. Thank you for your code check, and the codes are indeed no problem. I try to print all the gradients and find that the gradients are propagated but are 0. The reason is that when I debug in my own device I reduce the whole graph and cause the gradients to be 0. Later in the server debugging is normal.

I would like to consult the second question, I have to use the NeighborSampler method because the graph is too large. Then, what should I do if I want to change the adjacency matrix into a trainable parameter matrix in GCNs?

rusty1s commented 2 years ago

I think your example is already correct in this regard, that is, you have a global and trainable edge_weight matrix (passed to the optimizer as well), which you index for specific adjacencies via edge_weight = self.edge_weight[adj.e_id].

MrShouxingMa commented 2 years ago

Maybe I don't express it clearly, my adjacency matrix stores the weights of the edges. The trainable edge_weight size depends on the dimensionality of the input and output features. As described in the Kipf paper,

math::
      {H}^{(l+1)} = \sigma (\tilde{D}^{-\tfrac{1}{2}}\tilde{A}\tilde{D}^{-\tfrac{1}{2}}{H}^{(l)}{W}^{(l)})

W denotes the trainable matrix,and my question is that I want to make A also trainable parameter to be assigned in the first initialization. Because A stores the weights of the edges.

rusty1s commented 2 years ago

I'm not sure I understand what you mean with edge_weight being dependent on the dimensionality of input and output features. As far as I can tell, you want to learn a vector edge_weight with shape [num_edges]. To do this in a neighbor sampling scenario, you will have to index the weights that are actually used during computation (via edge_weight[adj.e_id].

MrShouxingMa commented 2 years ago

I seem to understand a little bit. Thank you for your advice, I am trying it myself.