GraphBolt is a new GNN-based dataloading framework, which is GNN-library agnostic. In particular, it provides feature store and sampling routines to allow for scalable data loading across CPU/GPU devices. The GraphBolt repository also contains PyG examples.
This issue tracks progress towards PyG in-house GraphBolt support, i.e. via providing a backend option in NeighborLoader and LinkNeighborLoader classes.
FeatureStore
[ ] Implement torch_geometric.data.CUDAFeatureStore by maintaining an internal graphbolt.TorchBasedFeatureStore with graphbolt.GPUCachedFeature features.
[ ] Implement torch_geometric.data.OnDiskFeatureStore by maintaining an internal graphbolt.TorchBasedFeatureStore with graphbolt.OnDiskFeature features. (TBD)
Samplers
[ ] Implement a torch_geometric.sampler.GraphBoltNeighborSampler(NeighborSampler) implementation that uses GraphBolt as the backend for performing sample_from_nodes and sample_from_links.
[ ] Support temporal sampling in GraphBoltNeighborSampler
Data Loaders
[ ] Implement a backend option in NeighborLoader and LinkNeighborLoader that creates NeighborSampler instances based on the chosen backend (backend="default"->sampler.NeighborSampler, backend="graphbolt"->sampler.GraphBolt.NeighborSampler
[ ] Test GPU-based sampling via backend="graphbolt"
[ ] Integrate graphbolt.ItemSampler and datepipe.fetch_feature routines into NeighborLoader and LinkNeighborLoader in case the chosen backend is set to "graphbolt"
Examples
[ ] Provide an e2e example for GPU-based sampling via backend="graphbolt"
π The feature, motivation and pitch
GraphBolt is a new GNN-based dataloading framework, which is GNN-library agnostic. In particular, it provides feature store and sampling routines to allow for scalable data loading across CPU/GPU devices. The GraphBolt repository also contains PyG examples.
This issue tracks progress towards PyG in-house GraphBolt support, i.e. via providing a
backend
option inNeighborLoader
andLinkNeighborLoader
classes.FeatureStore
torch_geometric.data.CUDAFeatureStore
by maintaining an internalgraphbolt.TorchBasedFeatureStore
withgraphbolt.GPUCachedFeature
features.torch_geometric.data.OnDiskFeatureStore
by maintaining an internalgraphbolt.TorchBasedFeatureStore
withgraphbolt.OnDiskFeature
features. (TBD)Samplers
torch_geometric.sampler.GraphBoltNeighborSampler(NeighborSampler)
implementation that uses GraphBolt as the backend for performingsample_from_nodes
andsample_from_links
.GraphBoltNeighborSampler
Data Loaders
backend
option inNeighborLoader
andLinkNeighborLoader
that createsNeighborSampler
instances based on the chosen backend (backend="default"
->sampler.NeighborSampler
,backend="graphbolt"
->sampler.GraphBolt.NeighborSampler
backend="graphbolt"
graphbolt.ItemSampler
anddatepipe.fetch_feature
routines intoNeighborLoader
andLinkNeighborLoader
in case the chosen backend is set to"graphbolt"
Examples
backend="graphbolt"