zhiqi-0 / PaGraph

SoCC'20 and TPDS'21: Scaling GNN Training on Large Graphs via Computation-aware Caching and Partitioning.
MIT License
47 stars 14 forks source link

How to prepare dataset? Using from dgl? #2

Open YijianLiu opened 1 year ago

YijianLiu commented 1 year ago

Hello, how to prepare dataset, need your help! Thanks so much!

zhiqi-0 commented 1 year ago

Please check the code here get dataset.

And README has explained the dataset format of adj.npz and labels.npy, etc. All these data should be placed under one datafolder, which will be taken by --dataset argument.

Since I don't know what your dataset looks like, you need to convert your dataset into the above format.

zhiqi-0 commented 1 year ago

To your Q1: Yes, it should be ok. For reddit_self_loop, I'm not sure whether DGL currently still supports downloading Reddit-self dataset, you could double check the tutorials and see how they train models with given datasets.

To your Q2: Please checkout code in DG

zhiqi-0 commented 1 year ago

Sorry, will find you the next month, I'm really busy these days. You can firstly try to read the code and figure out how it works.