Closed devnkong closed 3 years ago
Hi! Great point. data.edge_index
is the same as split_edge['train']
and not disjoint (data.edge_index
is made undirected to allow bi-directional message passing for undirected graph datasets). In our example code, we indeed perform message passing over data.edge_index
and try to predict split_edge['train']
.
If you want to perform message passing over a subset of training edges, you need to explicitly write that code.
Thanks for the prompt reply! That's crystal clear!
Hi, I'm a bit confused about the relationship between the training edge set and the graph adjacency we load for link tasks.
My understanding is that, the two are disjoint sets. The adjacency is for you to do message passing, while the training edges is for you to make predictions and compute loss. I have no trouble with all of the full-batch examples, but when I look at the code below: https://github.com/snap-stanford/ogb/blob/048ef636fda3d0c4a25b108651bf1d43050fa7ae/examples/linkproppred/citation2/cluster_gcn.py#L93
It seems that you directly compute losses on the edges from the adjacency you have done message passing on (the train function does not use the split_edge['train'] at all). Is this an analogy of the training edges, or the adjacency and the training edges are identical (I checked the shapes of the two it seemed that they are not identical)?
Looking forward to your reply!