snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

Faster loading of MAG240M feats in DRAM #458

Open UtkrishtP opened 9 months ago

UtkrishtP commented 9 months ago

Hello Team,

I have sufficient DRAM in my system close to 750G, and am looking to load feats in_memory to exploit faster DRAM access. However I see the format stored is .npy which makes the loading process extremely slow.

Like the ogbn_papers100M and other family of datasets we use .npz compressed format and also store in a preprocessed directory in binary format which makes loading times from disk extremely fast.

Is it possible to re-use the same libraries for MAG240M datastets, or is there any workaround?

TIA.