pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.38k stars 3.67k forks source link

How to do edge classification with RGCN #2051

Open Joel-De opened 3 years ago

Joel-De commented 3 years ago

❓ Questions & Help

Is there an example or can someone show me a simple model architecture of RGCN for edge classification? I've looked the documentation and examples for everything, but looking at the output shapes from the layers it seems to be that the RGCN layer is using relational information of a graph to improve node classification. Is there a way to use RGCN to perform edge classification or do I need to approach it from another angle? Currently, for the inputs of the RGCN layer, the edge labels are required as one of the parameters but I'm trying to solve for those labels (using edge_index and feature vectors))so I'm not sure how to go about doing that.

Thanks

rusty1s commented 3 years ago

If you do not have any edge types as input, then there is no need to use RGCNConv, but instead use other GNN layers such as GCNConv or SAGEConv. You can find an example of classical link prediction here (binary classification), and you may want to modify it into a multi-label edge classification problem. Here, you want to train against ground-truth edge labels as well as the existence of an edge.

always-ready2learn commented 2 years ago

@rusty1s Hi, I am currently working on a problem that requires handling multi-label edges. I am able to form custom graph knowledge for my multi-label edge classification problem. Could you please give a hint as to what would be the modifications to the GNN when feeding the data in order for the model to handle multi-label edge data?

rusty1s commented 2 years ago

Do you mean that you have multi-dimensional edge features, or do you want to perform mulit-label edge classification?

You should be able to train multi-label classification via BCEWithLogitsLoss, see here:

z = model(x, edge_index)

edge_feat = torch.cat([z[edge_index[0], z[edge_index[1]], dim=-1)
edge_pred = self.mlp(edge_feat)
loss = loss(edge_pred, data.edge_label)
always-ready2learn commented 2 years ago

Hi @rusty1s Thanks for your response, this is exactly what I am doing in my code but it always end up crashing (due to index error) at this line z = model(x, edge_index). I believe in multigraph (assuming multiple edges between two nodes) forward and propagation classes have to be edited to cater for multiple edges (instead of one edge ) between each pair of nodes. Is my understanding correct? Is there any example code for multigraph edge classification?

Regards

rusty1s commented 2 years ago

The index error may result from edge_index containing invalid indices, i.e. you should confirm that the following runs through:

assert edge_index.min() >= 0
assert edge_index.max() < x.size(0)

Otherwise, duplicated edges will be treated as such by default within PyG.

always-ready2learn commented 2 years ago

@rusty1s You are right. Thanks for pointing. Could you please direct me to fixing this issue?

115 assert edge_index.min() >= 0 --> 116 assert edge_index.max() < x.size(0)

AssertionError:

rusty1s commented 2 years ago

This means that your graph is not correctly encoded. You need to ensure that all indices are correctly mapped to values between 0 and num_nodes - 1. Let me know if this helps!

always-ready2learn commented 2 years ago

@rusty1s Thank you, that worked for me. But the GRUGconv model (that I'm using) for this experiment is giving low accuracy. I have tried changing model parameters and input features (edge features) but the accuracy always end up to be 65%. I am more concerned about getting the same accuracy value after trying several different ways. could my graph model be correct if I am getting the same accuracy values after trying several ways. What could I be missing?

rusty1s commented 2 years ago

If you are getting the same accuracy despite several changes, this is a sign that your model cannot learn anything. This might be due to the data or the model (hard to say). Is 65% training accuracy the performance you get by always picking the majority class? Why do you need the GRU in the first-place? Does it perform better without it?

always-ready2learn commented 2 years ago

I m getting the same accuracy for GConv and Sage as well. The model does show learning (in my opinion) as the loss continues to decrease and the accuracy vs epoch and loss vs epoch plots are correct but it always converges to same accuracy value. Could you give a hint as to what steps can be done to check / debug the issue?

rusty1s commented 2 years ago

GCNConv and SAGEConv are similar operators so it is to be expected that they converge to a similar performance. To verify the gains of graph-based machine learning, you can try to replace your model via a simple MLP (torch_geometric.nn.MLP). In case this also converges to the same accuracy, there is definitely something fishy going on.

always-ready2learn commented 2 years ago

I replaced to torch_geometric.nn.MLP and unfortunately, I am getting same accuracy. :( I am lost at this point. Where could be the issue in the code? Could you please direct me to the possibilities which may lead to this? Thank you for your guidance.

rusty1s commented 2 years ago

Interesting. At this point I think you should validate your existing features or engineer better ones. Can you confirm that all your features contain reasonable values, e.g. there do not exist any highly skewed values.

always-ready2learn commented 2 years ago

Sorry for late response, I got busy with other work. Yes, I did feature engineering and have no skewed data now. The accuracy did improve up to 69% and now always converges to this point no matter what changes I make to hyperparameters. I doubt that this is due to mlp on edges as the classification is done at the edges (which remains same). Could I be correct in my doubt? @rusty1s

rusty1s commented 2 years ago

Good to see the model performance is increasing. Can you clarify what you mean by "is due to mlp on edges as the classification is done at the edges (which remains same)"?

rohandas14 commented 8 months ago

@rusty1s I would like to use RGCN for a link prediction task with just one node type but 5 different edge types.

  1. What should I pass to the edge_index parameter of the forward function? If I understand correctly, I'll have five different edge stores for the five edge types - do I just stack them together?
  2. If I stack the edge indices together, then is the edge_type just the edge label in [0, n-1] for the corresponding entry in edge_index?
  3. Since I also want to add negative samples, should the num_relations parameter for the RGCN layer be initialized with 5+1=6 relations?

Thank you for your time!

rusty1s commented 8 months ago

Yes, you can just concatenate them, and edge_type represents the edge type from 0 to 4 for every edge. For negative samples, I assume you want to have this in the output (not in the graph/input to the network), so you would map to 6 possible classes.

rohandas14 commented 8 months ago

Thanks @rusty1s! Right, I am only including negative samples during training, so I need to map to 6 different classes, where 0 corresponds to the negative edge or no edge case. This makes sense to me. However, just to clarify, does this mean I need to initialize the num_relations param with 6? Looks like initializing with the true number of classes, i.e. 5 doesn't work, since I get an index out of bounds exception. Initializing with 6 works, which kinda makes sense because it doesn't look like the RGCN layer would implicitly know if negative samples are present or not.

rusty1s commented 7 months ago

Is there any specific reason you want to use negative edges for message passing? Ideally, you just want to train against them (via edge_label_index/edge_label), while keeping the original graph for message passing (edge_index/edge_type).

rohandas14 commented 7 months ago

No, you are right. I don't want to use negative edges for message passing. From what I understand, if I have 5 positive edges and add negative edges during training, then edge_label will have values in [0, 5] assuming I use 0 as the label for the negative edge/no edge scenario. That's where I get the error with initializing the RGCN layer:

self.conv1 = RGCNConv(input_size, hidden_size, num_relations=6)

If I pass 5 to num_relations, given the number of positive edges, I get the index out-of-bounds error.

If it helps, I don't necessarily split my edges into message or supervision edges. My training set is a list of multiple small independent graphs, and I am not training on disjoint message/supervision edges. Each training batch is just one graph in the training set, and I use gradient accumulation to speed up training. Although this might sound counterintuitive, this is okay given that I am only training the graph neural net to learn rich representations that capture relational info, which is then used for a completely different downstream task.

Maybe I am implementing this incorrectly, but if I look at this example here: https://github.com/pyg-team/pytorch_geometric/blob/84ce7fe14d0fecaa9421fdfd122e1503b47530bc/examples/rgcn_link_pred.py#L90C1-L95C68

Does this mean that on line 90 data.edge_type corresponds to message passing edges? And data.train_edge_type on lines 92 and 95 are the supervision edges? If so, then for my setup both of these should be the same, I think. In this case data.edge_type and data.train_edge_type for my setup should only differ by the negative edges.

Or maybe I don't understand what you mean by training against negative edges. Am I not supposed to have a label for the negative edge and be able to predict it?

rusty1s commented 7 months ago

Does this mean that on line 90 data.edge_type corresponds to message passing edges? And data.train_edge_type on lines 92 and 95 are the supervision edges?

Yes. IMO, you should make sure that data.edge_type do not contain negative samples.

rohandas14 commented 7 months ago

Thank you so much! This makes sense to me now. Just one last follow-up if it's okay. Looking at the example again: https://github.com/pyg-team/pytorch_geometric/blob/e213c297bb2aeb9ac50db258f5ab01ea11aea349/examples/rgcn_link_pred.py#L92C1-L95C68

Why is the same data.edge_train_type (which I understand is just the positive edge labels) passed to the decoder for both positive and negative edge prediction? Shouldn't we pass just a tensor of zeros as the negative edge labels to the decoder for negative edges?

rusty1s commented 7 months ago

Here in the example, we just assume random edge_type for the negatives, so we re-use edge_train_type when computing the score for negative links.