snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.93k stars 397 forks source link

No Positive Samples in MolPCBA Test Set Assay #45 #269

Closed raunakdoesdev closed 2 years ago

raunakdoesdev commented 2 years ago

Minimal reproducible example:

from ogb.graphproppred import PygGraphPropPredDataset
from torch_geometric.data import DataLoader
import torch

dataset = PygGraphPropPredDataset(name = 'ogbg-molpcba')
split_idx = dataset.get_idx_split()
print(sum([torch.nan_to_num(point.y) for point in dataset[split_idx['test']]])[0, 45])

This results in an output of 0 meaning there are no positive examples for this specific endpoint (index 45). How can we properly calculate the overall average precision in this case? This metric is undefined with zero positive samples.

weihua916 commented 2 years ago

You are right. The label index 45 does not have any positive test examples, and the Average Precision cannot be defined on the label. Our evaluator actually skips labels that do not have any positive examples: see here, so you won't experience the error if you use our evaluator.