[SYCL] Unnecessary read_write dependencies when multiple devices read same buffer

mfbalin commented 4 years ago

I have some code that launches multiple kernels and distributes them on multiple queues which are for different CUDA devices. When only 1 gpu is used, we get the following dependency graph: dep_graph

When the kernels are distributed among different devices, then we get the following graph: dep_graph_multi

I would expect the graph not to change much and I would expect no dependencies between different "tri_kernel" kernels. The kernels access buffers in read-only mode and these buffers are shared between different kernel launches. In the latter dependency graph, even though I am using multiple devices, I observe that only one of them runs at a time because of dependencies.

romanovvlad commented 3 years ago

@mfbalin Could you please share the reproducer? Are these queues bound to the same context?

mfbalin commented 3 years ago

code

mfbalin commented 3 years ago

They should be bound to the same context, I assume it is the CUDA context. However, you can verify that from the code.

mfbalin commented 3 years ago

Use of multiple queues bound to different cuda devices doesn't decrease runtime at all as there is no parallel use of different GPUs.

github-actions[bot] commented 3 days ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

intel / llvm

[SYCL] Unnecessary read_write dependencies when multiple devices read same buffer #2053