Multi-processor NeighborSampler feature request

pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch

https://pyg.org

MIT License

21.12k stars 3.63k forks source link

Multi-processor NeighborSampler feature request #488

Open duncster94 opened 5 years ago

duncster94 commented 5 years ago

🚀 Feature

Multi-processor support for NeighborSampler.

Motivation

In my application, I use the NeighborSampler in order to train on large graphs. I'm finding, however, that it is very slow. On smaller graphs, when I can load the full graph into GPU memory and don't use NeighborSampler I have a significant reduction (order of magnitude or more) in run-time. If it's possible to parallelize NeighborSampler, I think the speedup would be substantial.

Thanks for the consideration.

rusty1s commented 5 years ago

Yes, a num_workers argument is in the pipeline, although I personally find the NeighborSampler already quite fast :) Are you sure that this is the bottleneck? Would be interesting to see some benchmarks on this.

duncster94 commented 5 years ago

Interesting. It could be something I'm doing, I'll profile my code and get back to you!

duncster94 commented 5 years ago

You're right, it's not the NeighborSampler after all.

I'm using a graph autoencoder to get unsupervised node embeddings. I try to reconstruct the input graph structure in the output. Since I'm batching, the loss function sees a subgraph each time its called. Obtaining this subgraph is the slow part (my own function, nothing to do with pytorch-geometric).

This brings up a good point though: say for example I have a DataFlow object that looks like (128<-543<-831). My loss function requires the subgraph corresponding to the batch (i.e. a 128x128 graph). How can I efficiently obtain this subgraph from the DataFlow object?

Thanks again!

rusty1s commented 5 years ago

So you are trying to reconstruct the subgraph from the DataFlow object? Then the graph you are trying to reconstruct does not have shape 128 nodes, but rather 128+543+831 nodes, given that there are no duplicated nodes in the bipartite graphs (which is highly unrealistic). This is a tricky one, and I assume it does need a more flexible API to allow this kind of functionality (which leads to the discussion here).

duncster94 commented 5 years ago

The output of my model is always a batch_size x batch_size tensor, and the objective is to minimize the loss between output and a batch_size x batch_size tensor target that corresponds to the subgraph given by the nodes in the batch.

The way I'm thinking about this is that if NeighborSampler yields a DataFlow like (4<-10<-12) then my model should only compute features for the 4 nodes at the end of the bipartite graph (and give a 4x4 tensor post dot-product), not the 12 at the beginning. So my problem is: how do I subset data_flow.edge_index to only include edges from the 4 nodes? If I can get these edges, then I can create a sparse tensor and make it dense to get the corresponding 4x4 subgraph.

rusty1s commented 5 years ago

Ok, I understand. You can use this function to filter your original edge_index, which has runtime O(E). A faster way to achieve this would be to first filter out rows in your adjacency matrix and only after that calling the function (which should result in run-time O(B*D) with B being the batch-size and D being the maximum node degree). I think you can filter sparse adjacency matrices row-wise by using scipy.

duncster94 commented 5 years ago

Nice, thanks for the suggestion. I ended up using the subgraph function from torch_geometric.utils since I need to keep track of edge attributes as well. This runs significantly faster now - thanks for the help!

rusty1s commented 5 years ago

Ah, makes sense as well :)