Open wawltor opened 3 years ago
The usage of CUDA streams can parallelize memory device transfers (via *.to(device, non_blocking=True
) and actual GPU kernel execution. That's exactly what we are using it for.
From what I know, it doesn't seem possible to parallelize GPU kernel executions using multipe CUDA streams in PyTorch, at least I didn't have any success yet in doing so.
@rusty1s is there some problem in the code? The dst tensor is cuda memory, src tensor is cpu memory,the destination is cudaMemcpyDeviceToHost.
Thanks for reporting. This is indeed wrong and I fixed it. Luckily, I confirmed that it has been working successfully anyway.
Hi, It is a great job for the big GNN training, thank you for your job. I have a question, It is seems that the cuda streams could not parallel in the pytorch, like issue https://github.com/pytorch/pytorch/issues/25540, is there some tricks in PygGas?