Closed jasperzhong closed 11 months ago
提了一个非常大的数据集,一个Homogeneous graph,一个Heterogeneous graph. 数据源来自MAG. 基本就是放大版的MAG
最后提到了系统挑战. mmap这种方式确实太慢了,基本每次access都要page fault然后从磁盘读取,还要等待半天.
不过training GNN on SSD也挺多工作了. like MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks
https://arxiv.org/pdf/2306.16384.pdf
citation发现了已经有follow up work了. 做dataloader.
https://arxiv.org/pdf/2302.13522.pdf
propose a large dataset