pytorch cuda streams parallel

rusty1s / pyg_autoscale

Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

http://arxiv.org/abs/2106.05609

MIT License

158 stars 27 forks source link

pytorch cuda streams parallel #2

Open wawltor opened 3 years ago

wawltor commented 3 years ago

Hi, It is a great job for the big GNN training, thank you for your job. I have a question, It is seems that the cuda streams could not parallel in the pytorch, like issue https://github.com/pytorch/pytorch/issues/25540, is there some tricks in PygGas?

rusty1s commented 3 years ago

The usage of CUDA streams can parallelize memory device transfers (via *.to(device, non_blocking=True) and actual GPU kernel execution. That's exactly what we are using it for.

From what I know, it doesn't seem possible to parallelize GPU kernel executions using multipe CUDA streams in PyTorch, at least I didn't have any success yet in doing so.

wawltor commented 3 years ago

@rusty1s is there some problem in the code? The dst tensor is cuda memory, src tensor is cpu memory，the destination is cudaMemcpyDeviceToHost.

rusty1s commented 3 years ago

Thanks for reporting. This is indeed wrong and I fixed it. Luckily, I confirmed that it has been working successfully anyway.