jasperzhong / read-papers-and-code

My paper/code reading notes in Chinese
43 stars 3 forks source link

KDD '23 | IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research #363

Closed jasperzhong closed 11 months ago

jasperzhong commented 11 months ago

https://arxiv.org/pdf/2302.13522.pdf

propose a large dataset

jasperzhong commented 11 months ago

image

image

提了一个非常大的数据集,一个Homogeneous graph,一个Heterogeneous graph. 数据源来自MAG. 基本就是放大版的MAG

最后提到了系统挑战. mmap这种方式确实太慢了,基本每次access都要page fault然后从磁盘读取,还要等待半天.

image

不过training GNN on SSD也挺多工作了. like MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks

jasperzhong commented 11 months ago

https://arxiv.org/pdf/2306.16384.pdf

citation发现了已经有follow up work了. 做dataloader.