snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 398 forks source link

OGB Datasaver issues #135

Closed jqmcginnis closed 3 years ago

jqmcginnis commented 3 years ago

Hi ogb-team!

I followed your tutorial in ogb/ogb/io/Readme.md to create my own custom dataset for a link prediction task. The tutorial/steps are very straight forward, thank you very much for providing the details and examples.

After that, I tried to include my custom dataset (without official submission) by adding it to the master.csv in the linkpropred via make_master_file.py. So now, when I load my custom dataset via the official API calls, the ogb package fetches my zipped file (via http) and extracts it. However, I get several missing file errors.

Ihave noticed that the storage formats, e.g. ogbl-ddi and ogbl-mydataset are not compatible. Whereas ogbl-ddi (and others) use compressed csv files, the dataset I have obtained in your tutorial uses npy files (in a slightly different folder structure). Naturally, the formats can be easily converted to/from, that should not be an issue. However, it would be even more convenient if this step could be omitted. What are your thoughts on that? Please let me know, if you would like someone to implement this.

weihua916 commented 3 years ago

Hi! Thanks for your question! That's a great point. We have two ways for storing graph data: csv and npy. For the dataset saver, we are using npy, since we later found that it leads to faster and memory-efficient reading/writing compared to the csv format. For the current OGB datasets, only ogbn-papers100M is saved using npy format, since this dataset is super huge and using npy format is critical. Other datasets are still saved with the old csv format.

In order for the package to know which format (csv or npy) is used, you need to pass binary in your make_master_file.py. In your case, since DatasetSaver is using npy format, you need to pass binary=True. An example is here.

Hope this helps!

jqmcginnis commented 3 years ago

@weihua916 Thank you very much, that's very helpful!