snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

Question concerning PygLinkPropPredDataset - get_edge_split() #248

Closed bits-glitch closed 2 years ago

bits-glitch commented 2 years ago

Hello OGB Team,

I have looked at several examples from the LinkPropPred Datasets and have a question concerning the train/test/val split.

Let's take an example:

dataset = PygLinkPropPredDataset(name = "ogbl-ppa")
split = dataset.get_edge_split()

If I look at "split", I can examine 5 dictionaries:

split['train']['edge'] 
split['valid']['edge'] 
split['valid']['edge_neg'] 
split['test']['edge'] 
split['test']['edge_neg']

I do understand that we need negative edge samples to test the model prediction on negative/non-existing edges as well, but why don't you include negative train edges, i.e. split['train']['edge_neg']? Does this mean that we are not training on negative edges, but only measure the accuracy on negative test and validation links?

rusty1s commented 2 years ago

Generating negative training edges is expected to be performed inside the model via negative sampling. This models the real case scenario in which we are only given positive relations, and want to infer missing ones.

bits-glitch commented 2 years ago

Thank you very much for the explanation!