Open inducer opened 7 years ago
I tried the below quick hack but I am getting INVALID_COMMAND_QUEUE later on. There is no error upon calling clCreateCommandQueue though. Also setting an invalid QUEUE_SIZE does result in INVALID_QUEUE_VALUE, for example on AMD creating a queue with 16MB size when 8MB is max.
From stackoverflow https://stackoverflow.com/questions/45767759/how-to-set-device-side-queue-size-in-pyopencl/49957843#49957843
Sorry, I don't have the spare cycles at this moment to investigate in detail. I've put this on my list for later in the summer.
Will definitely check this out soon - thanks for adding support on this.
Kicking this slightly back to life: did you manage to get a device-side queue working through pyopencl ever? I have spent the better part of today trying to make this work, but the closest i've gotten is the queue being created and a "clEnqueueNDRangeKernel failed: INVALID_COMMAND_QUEUE" being thrown at me when I try to enqueue a dumb kernel (that does nothing).
What ICD (OpenCL driver) are you using?
Edit: PEBCAK
The ranting below here is because i didn't understand that you can't enqueue to a device side queue from the host side. You need 2 queues. One on the host, one on the device. You can mark the device queue as default.
-- I've tried both the Nvidia(1.2) and intel (2.1) runtimes. The method complains about incompability when i use nvidia, of course.
Both using this way:
cl.CommandQueue(self._cl_context, properties=cmcq.ON_DEVICE | cmcq.ON_DEVICE_DEFAULT | cmcq.OUT_OF_ORDER_EXEC_MODE_ENABLE)
and...
cl.CommandQueue(self._cl_context, properties = [cmq.PROPERTIES, cmcq.ON_DEVICE | cmcq.ON_DEVICE_DEFAULT | cmcq.OUT_OF_ORDER_EXEC_MODE_ENABLE, cmq.SIZE, 1024])
leads to
pyopencl._cl.LogicError: clEnqueueNDRangeKernel failed: INVALID_COMMAND_QUEUE
actually, I lie, on nVidia this leads to Segfault, though I have read that the ...withProperties() function is supported now.
Removing this and simply making an in-order on-host queue (default) the kernel runs fine...
Thanks for following up! Just to be clear: Did you get things to work on Intel? (I'd expect that to work more than I'd epxect the same of Nvidia.)
Eh!
It's complicated. So, I am for sure able to create on-device queues on both intel and nvidia platforms. I have made the following observations:
Using the ...withProperties()-call is required for doing this on nvidia. For intel I can use both calls and it works: but only on certain cards. My desktop has a 1660 and it doesn't work (OUT OF RESOURCES error), but the same code on a Tesla V100 works. I have an AMD card as well that throws "out of host memory" when I try to make the second queue using the withProperties() function, but I am able to use the 'normal' CreateCommandQueue().
I can enque_kernel() on both intel and nivida: BUT, on both platforms I get hangs if I do not turn off code caching. No idea why.
Thanks for reporting back! Could you share some example code? I'd like to include that in the tests, if for no other reason than to make sure that the things that are working stay working.
This would be the pattern to follow: https://github.com/inducer/pyopencl/blob/21b09e316b00765d9c1612d4ad6b078003939049/src/c_wrapper/command_queue.cpp#L42-L66