Closed tangjinchuan closed 1 year ago
Dear MS team, I would like to give my observe about this problem. This problem still exists with latest version 1.2112.2.0 as published 16 days ago.
One important thing I discovered was that If a kernel have been enqueued twice or more (e.g., for loop in cat kernel submission), and if none of it is being executed at the device side, then this OpenCLOn12 runtime will ignore the previously queued kernel but only leave the latest one who has the same kernel name. This causes the incorrect results while doing examples like ocl_cat(1,a,a,a).
To fix this, we have to add clFlush between two kernel submissions if their names are the same.
For more info: https://sourceforge.net/p/octave-ocl/discussion/general/thread/b47bc925d3/
Hi, please see the attached test case to reproduce the results. I create a simple opencl program with codeblocks to simulate a catting-two-array behavior. Array 'a' has a length of 3 while array 'c' has alength of 6. The former is initialized to 1 while the latter is initialized to 0. Then, the program issues two kernels by a loop to concat two array 'a' into one array c. Then the program sums the whole array c, which should produce 6. 6 is obtained by using intel opencl gpu driver, while 3 is obtained with openclondx12. Note: to select the correct gpu device (like openclondx12 driver), you need to change line 83 to the correct platform id: err = clGetDeviceIDs(cpPlatform[0], CL_DEVICE_TYPE_GPU, 1, &device_id, NULL); I tried other kernels like adding several times they issued with for loop and have no problem. Thank you very much! Best wishes, Jinchuan Tang ts.zip ts.zip
Sorry for the long delay on this. This was a good bug. It was broken by https://github.com/microsoft/OpenCLOn12/commit/56608692b1ff75dde5b5345762d9eeeaac24597d, which was in turn a fix for a Photoshop effect which had incorrect rendering. By moving the constant buffer data upload from the app thread kernel enqueue, to the worker thread submitting to D3D, I was no longer capturing the app's private
memory-space kernel args at the right time, so both kernels ended up with the last one's args.
Dear MS OpenCL team,
I tried to use your OpenCL driver on DX 12 devices with Octave 6.3 + Octave-Ocl OpenCL package (An opensource alternative to Matlab's GPU parallel toolbox, such as dropin replacement for gpuArray and many other functions.https://gnu-octave.github.io/packages/ocl) However, it could not pass the test with the help of cmd :ocl_tests or cat function offered by Octave-Ocl.
After install the ocl package with pkg install from https://gnu-octave.github.io/packages/ocl. You can use
For example :
aa = 2-dimensional OCL array (3x5) of class single (float)
ans = 2-dimensional OCL array (6x5) of class single (float)
ans =
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
The result is not right.
Best wishes, Jinchuan Tang