snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.93k stars 397 forks source link

Batching Graphs in PygPCQM4MDataset #148

Closed edwardelson closed 3 years ago

edwardelson commented 3 years ago

Hi thanks for preparing the processing code!

Was just thinking if batching the smiles graph into separate torch files would be a feasible solution to reduce memory requirement? I notice in the process() function of class PygPCQM4MDataset(InMemoryDataset):, the list of graphs obtained from the smiles strings are all combined into a single dataset, and subsequently torch.save'd into one file (only to be split again later on to different dataloaders? during training and testing)

Since all of the graphs are independent of each other, would it be possible to perhaps save these into a couple of torch files, each made of batches of several graphs data to reduce RAM requirement?

Thanks!

edwardelson commented 3 years ago

wait i think this is more suitable for the discussion forum sorry