intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.22k stars 729 forks source link

[SYCL] Unnecessary read_write dependencies when multiple devices read same buffer #2053

Open mfbalin opened 4 years ago

mfbalin commented 4 years ago

I have some code that launches multiple kernels and distributes them on multiple queues which are for different CUDA devices. When only 1 gpu is used, we get the following dependency graph: dep_graph

When the kernels are distributed among different devices, then we get the following graph: dep_graph_multi

I would expect the graph not to change much and I would expect no dependencies between different "tri_kernel" kernels. The kernels access buffers in read-only mode and these buffers are shared between different kernel launches. In the latter dependency graph, even though I am using multiple devices, I observe that only one of them runs at a time because of dependencies.

romanovvlad commented 3 years ago

@mfbalin Could you please share the reproducer? Are these queues bound to the same context?

mfbalin commented 3 years ago

code

mfbalin commented 3 years ago

They should be bound to the same context, I assume it is the CUDA context. However, you can verify that from the code.

mfbalin commented 3 years ago

Use of multiple queues bound to different cuda devices doesn't decrease runtime at all as there is no parallel use of different GPUs.

github-actions[bot] commented 3 days ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.