Open GoogleCodeExporter opened 8 years ago
Can you execute clinfo on this machine. I think this might be Apple's driver
not liking a workgroup/localsize of 256
If you execute clinfo it should tell you maximum group size. I think some Mac
OSX machines report 1024, but it is actually 128.
To test this. Instead of using
kernel.execute(yourSize);
Create a range using a fixed local size
Range range = Range.create(yourSize, 128);
kernel.execute(range);
This fixes the buffer size (to 128) rather than using defaults.
Gary
Original comment by frost.g...@gmail.com
on 28 May 2012 at 7:12
Setting the local size to 128 did fix the issue, thx!
About executing clinfo: Is there a way to do this using aparapi?
Original comment by misja.a...@gmail.com
on 29 May 2012 at 9:47
Glad to hear that setting the localsize manually fixed this. This is obviously
more of a 'workaround' than a fix, but hopefully it will move you forward.
Regarding executing clinfo from Aparapi.
You cannot currently do this from from any of the binary
downloads. If you are building from the trunk we offer enough
information in the new Device class to extract the actual the global
and local sizes.
So from trunk code you could use something like
Device device = Device.bestGPU();
Range range = device.createRange(yoursize).
To create a range suitable for a specific device which 'honors' the
limits imposed by the device and OpenCL runtime.
However this is still all relatively new and not quite ready for primetime.
I will keep this open until we create a new binary distribution.
Gary
Original comment by frost.g...@gmail.com
on 29 May 2012 at 1:04
I built aparapi from the trunk this time, and now I don't even need to set any
range or local size anymore, because the squares sample app runs without any
error!
Your trunk code is neatly organized by the way, I could even build it without
any problems on my Mac.
I checked the max. workgroupsize that is reported by the Device class for my
gpu, and it is 1024.
(Max. dimensions is 3 and max. workitems is 1024 for every dimension.)
And indeed I can even run the squares demo with a range size of 1024.
Could it be that something else was wrong with the binary distribution, causing
the error? Maybe it was mixing up my 2 videocards somehow?
Original comment by misja.a...@gmail.com
on 30 May 2012 at 7:07
Hmm... I wish I could claim credit for fixing this... :)
It might be that we are using the device infrastructure underneath.
Did you by any chance update your OpenCL driver.
So the trunk code is different. It now uses Device.firstGPU() under
the hood (unless the Range was created via another device).
What devices do you have? Two identical cards?
You may be able to help me test something ;)
If you have a build from the trunk you can do this.
Device device = Device.firstGPU();
Range range = device.createRange(1024);
kernel.execute(range);
Which is what Aparapi is now trying to do by default. Note that by
creating a Range via the device, the Range is bound to that device
and is 'guaranteed' (well highly likely ;) ) to have compatible
groupsizes.
You can also select the best GPU (which might be the first!)
Device device = Device.best();
Range range = device.createRange(1024);
kernel.execute(range);
Also you can print the device info
if (device instanceof OpenCLDevice){
System.out.println("vendor ="+
((OpenCLDevice)device).getPlatform().getVendor());
}
And can create your own filter (for say picking the first AMD device
;) ) using the DeviceComparitor interface.
Here is the code for 'best'
public static Device best() {
return (OpenCLDevice.select(new DeviceComparitor(){
@Override public OpenCLDevice select(OpenCLDevice _deviceLhs,
OpenCLDevice _deviceRhs) {
if (_deviceLhs.getType() != _deviceRhs.getType()) {
if (_deviceLhs.getType() == TYPE.GPU) {
return (_deviceLhs);
} else {
return (_deviceRhs);
}
}
if (_deviceLhs.getMaxComputeUnits() >
_deviceRhs.getMaxComputeUnits()) {
return (_deviceLhs);
} else {
return (_deviceRhs);
}
}
}));
}
Can you try creating a range via a device (as shown above) and
validate that it is working?
Gary
Original comment by frost.g...@gmail.com
on 30 May 2012 at 10:02
No my two cards are different, one is a cpu-integrated Intel card and the other
an AMD Radeon.
I tried creating the range the way you described. This time I got an error
again when executing the squares application:
!!!!!!! clEnqueueNDRangeKernel() failed invalid work group size
after clEnqueueNDRangeKernel, globalSize[0] = 1024, localSize[0] = 1024
31-mei-2012 20:23:39 com.amd.aparapi.KernelRunner executeOpenCL
WARNING: ### CL exec seems to have failed. Trying to revert to Java ###
!!!!!!! clEnqueueNDRangeKernel() failed invalid work group size
after clEnqueueNDRangeKernel, globalSize[0] = 1024, localSize[0] = 1024
31-mei-2012 20:23:39 com.amd.aparapi.KernelRunner executeOpenCL
WARNING: ### CL exec seems to have failed. Trying to revert to Java ###
!!!!!!! clEnqueueNDRangeKernel() failed invalid work group size
after clEnqueueNDRangeKernel, globalSize[0] = 1024, localSize[0] = 1024
Same story when I used a range size of 512. With range size 128 it executed
without errors, just like with the binary distribution :)
I tried it both with Device.best() and Device.firstGpu(). I also printed the
vendor and name of the device, strangely enough they both give the same output:
'vendor = Apple'
Original comment by misja.a...@gmail.com
on 31 May 2012 at 6:34
Thanks for help debugging.. Clearly I have some work to do.
Apple is your OpenCL vendor (i.e they supply the OpenCL runtime).
Gary
Original comment by frost.g...@gmail.com
on 31 May 2012 at 7:34
I believe this issue is related to
http://code.google.com/p/aparapi/issues/detail?id=86
Original comment by ryan.lam...@gmail.com
on 15 Dec 2012 at 12:15
Original issue reported on code.google.com by
misja.a...@gmail.com
on 28 May 2012 at 1:06