pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
20.66k stars 3.58k forks source link

About GraphSAGE sampling on weighted graph #1961

Open jindeok opened 3 years ago

jindeok commented 3 years ago

❓ Questions & Help

Hello. I really appreciate for you to share such a implementation examples of GNN, However, I have a 1 short question.

I'm wondering what happens when I put weighted graph into GraphSAGE example. Does NeighborhoodSampling consider It as unweighted(binary edge) graph? or weight of edges affect the sampling process?

rusty1s commented 3 years ago

Currently, SAGEConv does not support weighted graphs, but GraphConv does (which is quite similar). Note that you need to pas both edge_index and edge_weight to the GNN op.

1byxero commented 3 years ago

Can we use sparse matrix for the edge weights?

rusty1s commented 3 years ago

Do you mean using the SparseTensor class? That is possible by passing edge weights to the value argument:

adj_t = SparseTensor(row=row, col=col, value=edge_weight, sparse_sizes=(N, N))
1byxero commented 3 years ago

This means we need one dense tensor storing edge weights initially. Right?

I don't want to do that as that dense tensor is taking up too much GPU memory. Is there any workaround?

jindeok commented 3 years ago

Thank you for your reply.!! It was very helpful.

Can I ask one more question? I'm also wonder whether I can apply NeighborSampler function on weighted graph if I want to construct training set from sampling.

rusty1s commented 3 years ago

NeighborSampler returns a e_id tensor, which can be used to index the original edge weights:

loader = NeighborSampler(edge_index, ...)
for batch_size, n_id, adjs:
    for edge_index, e_id, size in adjs:
        sampled_edge_weight = edge_weight[e_id]
jindeok commented 3 years ago

You mean edge_weight = data.edge_attr in here??

Actually I built weighted graph upon networkx graph than I convert it to torch_geometric data using from_networkx() function and I think that this converting method do not support to also transfer edge attribute from networkx graph. (Is only structural information of graph converted?)

In short, Now I am stuck in converting networkx weighted graph into torch_geometric weighted graph data.. Can you give me some tips on it??

Thanks for reading it!

rusty1s commented 3 years ago

Can you give me a short example to illustrate this issue?

jindeok commented 3 years ago

Im sorry I found that from_networkx()method also convert edge weight. anyway,

my work example]

(In graphsage_unsupervised example code, I just replaced SAGEConv into GraphConv here)

class myGNN(nn.Module):
    def __init__(self, in_channels, hidden_channels, num_layers):
        super(myGNN, self).__init__()
        self.num_layers = num_layers
        self.convs = nn.ModuleList()
        for i in range(num_layers):
            in_channels = in_channels if i == 0 else hidden_channels
            self.convs.append(**GraphConv**(in_channels, hidden_channels, aggr = 'mean'))

    def forward(self, x, adjs):
        for i, (edge_index, _, size) in enumerate(adjs):
            x_target = x[:size[1]]  # Target nodes are always placed first.
            x = self.convs[i]((x, x_target), edge_index) 
            if i != self.num_layers - 1:
                x = x.relu()
                x = F.dropout(x, p=0.5, training=self.training)
        return x

    def full_forward(self, x, edge_index):
        for i, conv in enumerate(self.convs):
            x = conv(x, edge_index) 
            if i != self.num_layers - 1:
                x = x.relu()
                x = F.dropout(x, p=0.5, training=self.training)
        return x

data = from_networkx(G)  # G: Weighted Graph that constructed in networkx
train_loader = NeighborSampler(data.edge_index, sizes=[10, 10], batch_size=256,
                               shuffle=True, num_nodes=data.num_nodes)

model.train()

total_loss = 0
for batch_size, n_id, adjs in train_loader:
      for edge_index, e_id, size in adjs:
        sampled_edge_weight = edge_weight[e_id]   # I stuck in this part(this code does not work)
        adjs = [adj.to(device) for adj in adjs]
        optimizer.zero_grad()
        out = model(....)
        ...
  1. How to design GNN forward method that I can input weighted graph?
  2. As for training phase, how can I get weight information from train_loader?
rusty1s commented 3 years ago

You need to index select the edge weights coming from data.edge_weight. The correct example would look similar to:

def forward(self, x, adjs, edge_weight):
        for i, (edge_index, e_id, size) in enumerate(adjs):
            x_target = x[:size[1]]  # Target nodes are always placed first.
            x = self.convs[i]((x, x_target), edge_index, edge_weight[e_id]) 
            if i != self.num_layers - 1:
                x = x.relu()
                x = F.dropout(x, p=0.5, training=self.training)

for batch_size, n_id, adjs in train_loader:
    ...
    model(x[n_id], adjs, data.edge_weight)
pintonos commented 2 years ago

Currently, SAGEConv does not support weightes graphs, but GraphConv does (which is quite similar). Note that you need to pas both edge_index and edge_weight to the GNN op.

For the inductive case it seems SAGEConv performs better than GraphConv. Is there any possibility to extend the SAGEConv with edge weights? And after reading the paper of GraphConv I am not sure why it should be similar to SAGEConv?

rusty1s commented 2 years ago

GraphConv is the same as SAGEConv in case you specify aggr="mean". Let me know if that works for you.

pintonos commented 2 years ago

GraphConv is the same as SAGEConv in case you specify aggr="mean". Let me know if that works for you.

Works fine now, thanks! The documentation of GraphConv only defines its node-wise formulation. What would be the definition for the whole graph as in GCNConv for instance?

rusty1s commented 2 years ago
X @ W_1 + D^{-1} A X W_2
Kang9779 commented 2 years ago

You need to index select the edge weights coming from data.edge_weight. The correct example would look similar to:

def forward(self, x, adjs, edge_weight):
        for i, (edge_index, e_id, size) in enumerate(adjs):
            x_target = x[:size[1]]  # Target nodes are always placed first.
            x = self.convs[i]((x, x_target), edge_index, edge_weight[e_id]) 
            if i != self.num_layers - 1:
                x = x.relu()
                x = F.dropout(x, p=0.5, training=self.training)

for batch_size, n_id, adjs in train_loader:
    ...
    model(x[n_id], adjs, data.edge_weight)

I have a question about,How to generate positive and negative samples by using NeighborSampler Sampling on weighted graph?

in this code pos_batch = random_walk(row, col, batch, walk_length=1,coalesced=True)[:, 1],how to randow walk according to the edge weight?

rusty1s commented 2 years ago

We do not have support for biased/weighted sampling of random walks yet, I am sorry.

ZRH0308 commented 1 year ago

You need to index select the edge weights coming from data.edge_weight. The correct example would look similar to:

def forward(self, x, adjs, edge_weight):
        for i, (edge_index, e_id, size) in enumerate(adjs):
            x_target = x[:size[1]]  # Target nodes are always placed first.
            x = self.convs[i]((x, x_target), edge_index, edge_weight[e_id]) 
            if i != self.num_layers - 1:
                x = x.relu()
                x = F.dropout(x, p=0.5, training=self.training)

for batch_size, n_id, adjs in train_loader:
    ...
    model(x[n_id], adjs, data.edge_weight)

I have a question about,How to generate positive and negative samples by using NeighborSampler Sampling on weighted graph?

in this code pos_batch = random_walk(row, col, batch, walk_length=1,coalesced=True)[:, 1],how to randow walk according to the edge weight?

Hi, I have the same need as you, how did you solve it in the end?

ZRH0308 commented 1 year ago

We do not have support for biased/weighted sampling of random walks yet, I am sorry.

Hi, @rusty1s Now is it possible to do weighted sampling of random walks? I need to sample positive neighbors based on the edge_weight.

If I just need to take an one step neighbor, can I directly select the TOP neighbor with the largest edge_weight in the current batch for each node as a positive sample? Do you think it will work?

rusty1s commented 1 year ago

There is an open PR for this out, see https://github.com/rusty1s/pytorch_cluster/pull/140. Maybe you can check it out to see if it fits your need.