Open ayasar70 opened 2 years ago
Yes, these two operations are the bottleneck. I think one may be able to scale that onto GPUs by only considering slices, e.g. only considering a subset of feature columns one by one. This should scale SIGN to both single GPU and multi GPU without the need of a multi-GPU SPMM. What do you think?
Agree. Column-wise slicing can be expensive though. Also using the same approach a multi-GPU SpMM can be implemented. In either case CPU-GPU bandwidth and slicing is going to be the bottleneck. I am investigating this problem using Python/Pytorch based and C++-based solutions. I will update you if I observe good speedup.
🚀 The feature, motivation and pitch
Hello, I was working on SIGN model. In a GPU-based setting, seems that during the preprocessing there can be two bottlenecks; SparseTensor creation and SpMM (https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/transforms/sign.py#L50). Because those operation are going to be on CPUs and of course large graphs cannot fit into a single GPU memory.
Do you think that carrying this computation to multi-GPUs would be helpful? If so, I can work on a CPP extension that takes rows and columns and feature matrix and outputs K layer's SpMM results?
Best
Alternatives
No response
Additional context
No response