Closed abagusetty closed 2 years ago
hi @abagusetty
No. It is expected the user creates a command queue to which the list is later send for execution. The only time a queue is created implicitly is when using an immediate list. In that case, the immediate list acts as both list and queue, so just creating the immediate list and then appending the copy operation is enough to offload the copy to the accelerator.
Hey @jandres742
thanks for your response. I was mostly profiling an app using onetrace
and have created a reproducer shown below. In short, the reproducer creates 2 queues and a sycl::memcpy
.
Adding a memcpy op creates a commandqueue and was just curious if this is expected for all cases. From your response above, does the code below intends to create an immediate list and hence implicitly creating a cmd queue.
Some of the profiling output is truncated.
#include <sycl/sycl.hpp>
int main(int argc, char **argv)
{
std::vector<sycl::device> allDevices = sycl::device::get_devices(sycl::info::device_type::gpu);
auto subDevices = allDevices[0].create_sub_devices<sycl::info::partition_property::partition_by_affinity_domain>(sycl::info::partition_affinity_domain::numa);
sycl::device *targetDev = new sycl::device(subDevices[0]);
sycl::context *targetContext = new sycl::context(*targetDev);
const int nQueues = 2;
const int n = targetDev->get_info<sycl::info::device::max_mem_alloc_size>() / (2*sizeof(float));
std::cout << "n : " << n << ", (GBs) : " << n*sizeof(float) * 1.0e-09 << std::endl;
// create events and queues
sycl::queue *queue[nQueues];
for (int i = 0; i < nQueues; ++i) {
queue[i] = new sycl::queue(*targetContext, *targetDev, sycl::property_list{sycl::property::queue::in_order{}});
}
// allocate host memory and device memory
float *h_a = new float[n];
float *d_a = sycl::malloc_device<float>(n, *targetDev, *targetContext); // device
queue[0]->memcpy(d_a, h_a, n*sizeof(float));
return 0;
}
@abagusetty thanks. Your example is a DPC++ code, so how/if the queue is created depends on how DPC++ and SYCL runtime implements the code with L0 calls.
So, in this case, your original question should be rephrased from
Would invoking a zeCommandListAppendMemoryCopy create an additional command queue
to
Would invoking a sycl::memcpy
create an additional command queue
from the point of view of L0, an L0 queue needs to be created to append a zeCopy operation that has been appended to a list, or alternatively, the zeCopy needs to go to an immediate command list. Which of the two the code above ends up using, would depend on how they implement and translate the code.
I would suggest to move this issue to https://github.com/intel/llvm, where they can exactly the process they use to create L0 queues from DPC++ code like the one you shared above.,
Thanks @jandres742. Closing the issue here and moving to intel/llvm
Would invoking a
zeCommandListAppendMemoryCopy
create an additional command queue (i.e., call tozeCommandQueueCreate
) implicitly ?