snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

How can I get SMILES from PCQM4Mdataset? #318

Closed shiokoo closed 2 years ago

shiokoo commented 2 years ago

By inheriting the PCQM4Mdataset, downloading it directly and processing it, I get a pyg Data object as input. Currently, My task needs SMILES to extract other features except the feature defined by smiles2graph in ogb.util. How can I get the original SMILES?

weihua916 commented 2 years ago

Hi! You can do the following. See more description here.

from ogb.lsc import PCQM4Mv2Dataset
dataset = PCQM4Mv2Dataset(root = ROOT, only_smiles = True)

# get i-th molecule and its target value (nan for test data)
i = 1234
print(dataset[i]) # ('CC(NCC[C@H]([C@@H]1CCC(=CC1)C)C)C', 6.811009678015001)