Closed alephman closed 7 years ago
@hughperkins I use aarch64 instead of x86 in Makefile, but still can't be passed when I run "make run-tests". I am not familiar with OpenCI and cuda, is there any idea or direction to debug it?
g++ -fPIC -o build/test-cocl-cuda_sample build/test-cocl-cuda_sample.o -g -lcocl -lOpenCL
/usr/bin/ld: build/test-cocl-cuda_sample.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external C
/usr/bin/ld: build/test-cocl-cuda_sample.o(.text+0xe8): unresolvable R_AARCH64_ADR_PREL_PG_HI21 rel'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
Makefile:128: recipe for target 'build/test-cocl-cuda_sample' failed
make: **\* [build/test-cocl-cuda_sample] Error 1
after adding -fPIC paramter, it is compiled successfully.
$ ./cuda_sample
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found Trying for OpenCL-enabled CPU Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 configureKernel (name=_Z8setValuePfif kernel build error:
kernel source: 1: 2: 3: 4: kernel void _Z8setValuePfif(global float* data, long dataoffset, int idx, float value) { 5: data = (global float)((global char *)data + data_offset); 6: 7: label0:; 8: int v1 = get_localid(0); 9: bool v2 = v1 == 0; 10: if(v2) { 11: goto v4; 12: } else { 13: goto v5; 14: } 15: v4:; 16: long v6 = idx; 17: global float v7 = (&data[v6]); 18: v7[0] = value; 19: goto v5; 20: v5:; 21: return; 22: } 23:
Invalid kernel name, code -46, kernel _Z8setValuePfif
kernel failed to build
kernel name: [_Z8setValuePfif]
saving kernel soucecode to /tmp/failed-kernel.cl
terminate called after throwing an instance of 'std::runtime_error'
what():
kernel source:
1:
2:
3:
4: kernel void _Z8setValuePfif(global float* data, long dataoffset, int idx, float value) {
5: data = (global float)((global char *)data + data_offset);
6:
7: label0:;
8: int v1 = get_localid(0);
9: bool v2 = v1 == 0;
10: if(v2) {
11: goto v4;
12: } else {
13: goto v5;
14: }
15: v4:;
16: long v6 = idx;
17: global float v7 = (&data[v6]);
18: v7[0] = value;
19: goto v5;
20: v5:;
21: return;
22: }
23:
Invalid kernel name, code -46, kernel _Z8setValuePfif
Aborted
after adding -fPIC paramter, it is compiled successfully.
Nice! :-)
Invalid kernel name, code -46, kernel _Z8setValuePfif
Thats a weird error. But before we dig into that, note that it looks like your OpenCL device is a CPU, rather than a GPU? :
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Is this what you expect?
1) What does "OpenCL-enabled CPU" mean? I use gpuinfo and clinfo to list GPU info, it looks good. Or Is there another simple test program to help me to check that?
2) I try runing EasyCL's easycl_unittests:
args: ./easycl_unittests --gtest_filter=-SLOW Note: Google Test filter = -SLOW [==========] Running 63 tests from 25 test cases. [----------] Global test environment set-up. [----------] 1 test from testscalars [ RUN ] testscalars.test1 found opencl library Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found Trying for OpenCL-enabled CPU Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 ..... .....
time: 720.502 [ OK ] teststatefultimer.notiming (1131 ms) [----------] 2 tests from teststatefultimer (2325 ms total)
[----------] Global test environment tear-down [==========] 63 tests from 25 test cases ran. (11010 ms total) [ PASSED ] 58 tests. [ FAILED ] 5 tests, listed below: [ FAILED ] testscalars.test1 [ FAILED ] testnewinstantiations.createForFirstGpu [ FAILED ] testnewinstantiations.createForIndexedGpu [ FAILED ] testDeviceInfo.basic [ FAILED ] testDeviceInfo.gpus
Hmmm, certainly seems gpu-like https://imgtec.com/powervr/graphics/series6/
Generally speaking, when one runs clinfo
, it will say for each device if it's a 'CPU' or a 'GPU'. Some differences between 'CPU' and 'GPU':
Things that work well on a GPU:
Things that work well on a CPU:
The issue with running GPU things on a CPU, is the GPU programs expect to find hundreds of cores, runningin waprs of 32, and crash to a halt if they find themselves running on a device iwht just 1-4 cores. For example, sorting and reduction might be strongly reliant on having large-ish warps.
If your device really is a 'GPU', which it appears from my link to be, it's a bit mysterious why it's reporting itself as a CPU. One thing to be careful about: eg Intel CPUs contain a GPU. If you dont have the right drivers installed, on an intel cpu, you'll see the 'CPU' bit of the cpu, as a CPU opencl device, and the 'GPU' bit wont show up. Even though they're both on the same chip, and it's a 'cpu' chip, theres a fair bit of difference in behavior between these two parts of the same device. Is it possible that the device you are using has two parts, CPU and GPU, and the drivers are currently only showing hte 'CPU' part?
Thanks a lot for your patient, hughperkins ! 1) It's a ARM arm64 4 cores CPU+ Imagination's PowerVR Rogue G6230 GPU, support OpenCL 1.2. I think this ARM chip is similar to Intel CPU contains a GPU, all in one chip. So can I understand that "The OpenCL-enabled CPU" is a CPU built-in GPU core inside?
I dig into the code, It get error at EasyCL.cpp: createForIndexedGpu(int gpu, bool verbose)---> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR , 100, device_ids, &num_devices); The returned error value is -31
However, the same API is OK in gpuinfo.cpp: error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, 0, 0, &num_devices); cl_device_id *device_ids = new cl_device_id[num_devices]; error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, num_devices, device_ids, &num_devices);
It seems the clGetDeviceIDs functions's parameters is different. But I can't understand why it failed ?
Ok. Sounds like you have the CPU drivers installed for opencl, and you need to install somehow drivers for the GPU too.
Basically, when you run clinfo
there needs to be at least one device with type 'gpu'. Seems like you should see total 2 devices: one CPU, and one GPU?
On 3 November 2016 01:48:37 GMT+00:00, alephman notifications@github.com wrote:
Thanks a lot for your patient, hughperkins ! 1) It's a ARM arm64 4 cores CPU+ Imagination's PowerVR Rogue G6230 GPU, support OpenCL 1.2. I think this ARM chip is similar to Intel CPU contains a GPU, all in one chip. So can I understand that "The OpenCL-enabled CPU" is a CPU built-in GPU core inside?
I dig into the code, It get error at EasyCL.cpp: createForIndexedGpu(int gpu, bool verbose)---> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR , 100, device_ids, &num_devices); The returned error value is -31
However, the same API is OK in gpuinfo.cpp: error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, 0, 0, &num_devices); cl_device_id *device_ids = new cl_device_id[num_devices]; error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, num_devices, device_ids, &num_devices);
It seems the clGetDeviceIDs functions's parameters is different. But I can't understand why it failed ?
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258048882
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Can we get the full output of clinfo
, and see what that says?
On 3 November 2016 01:48:37 GMT+00:00, alephman notifications@github.com wrote:
Thanks a lot for your patient, hughperkins ! 1) It's a ARM arm64 4 cores CPU+ Imagination's PowerVR Rogue G6230 GPU, support OpenCL 1.2. I think this ARM chip is similar to Intel CPU contains a GPU, all in one chip. So can I understand that "The OpenCL-enabled CPU" is a CPU built-in GPU core inside?
I dig into the code, It get error at EasyCL.cpp: createForIndexedGpu(int gpu, bool verbose)---> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR , 100, device_ids, &num_devices); The returned error value is -31
However, the same API is OK in gpuinfo.cpp: error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, 0, 0, &num_devices); cl_device_id *device_ids = new cl_device_id[num_devices]; error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, num_devices, device_ids, &num_devices);
It seems the clGetDeviceIDs functions's parameters is different. But I can't understand why it failed ?
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258048882
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Sorry for delaying relay. it seems only One GPU device:
./clinfo:
Number of platforms 1 Platform Name PowerVR Rogue Platform Vendor Imagination Technologies Platform Version OpenCL 1.2 Platform Profile EMBEDDED_PROFILE Platform Extensions cl_khr_icd cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_khr_spir cl_img_yuv_image cl_img_mipmap_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptr Platform Extensions function suffix IMG
Platform Name PowerVR Rogue
Number of devices 1
Device Name PowerVR Rogue G6230
Device Vendor Imagination Technologies
Device Vendor ID 0x1
Device Version OpenCL 1.2
Driver Version 1.5@3830101
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile EMBEDDED_PROFILE
Max compute units 2
Max clock frequency 352MHz
Device Partition (core)
Max number of sub-devices 1
Supported partition types none specified
Max work item dimensions 3
Max work item sizes 512x512x512
Max work group size 512
Preferred work group size multiple 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 0 / 0 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest No
Round to zero Yes
Round to infinity No
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 268435456 (256MiB)
Error Correction support No
Max memory allocation 67108864 (64MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 32768
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 8
Max size for 1D images from buffer 16384 pixels
Max 1D or 2D image array size 16384 images
Max 2D image size 16384x16384 pixels
Max 3D image size 0x0x0 pixels
Max number of read image args 8
Max number of write image args 1
Local memory type Local
Local memory size 4096 (4KiB)
Max constant buffer size 1048576 (1024KiB)
Max number of constant args 4
Max size of kernel argument 1024
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
SPIR versions 1.2
printf() buffer size 65536 (64KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_icd cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_khr_spir cl_img_yuv_image cl_img_mipmap_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptr
NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [IMG] clCreateContext(NULL, ...) [default] Success [IMG] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name PowerVR Rogue Device Name PowerVR Rogue G6230 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name PowerVR Rogue Device Name PowerVR Rogue G6230
ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.9 ICD loader Profile OpenCL 2.1
Oh yeah, that's exactly what I'd hope to see: it has device type 'gpu', which is what we want. But when you run cuda_sample, it says "can't find GPU, falling back on CPU"?
On 3 November 2016 09:21:03 GMT+00:00, alephman notifications@github.com wrote:
Sorry for delaying relay. it seems only One GPU device:
./clinfo:
Number of platforms 1 Platform Name PowerVR Rogue Platform Vendor Imagination Technologies Platform Version OpenCL 1.2 Platform Profile EMBEDDED_PROFILE Platform Extensions cl_khr_icd cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_khr_spir cl_img_yuv_image cl_img_mipmap_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptr Platform Extensions function suffix IMG
Platform Name PowerVR Rogue Number of devices 1 Device Name PowerVR Rogue G6230 Device Vendor Imagination Technologies Device Vendor ID 0x1 Device Version OpenCL 1.2 Driver Version 1.5@3830101 Device OpenCL C Version OpenCL C 1.2 Device Type GPU Device Profile EMBEDDED_PROFILE Max compute units 2 Max clock frequency 352MHz Device Partition (core) Max number of sub-devices 1 Supported partition types none specified Max work item dimensions 3 Max work item sizes 512x512x512 Max work group size 512 Preferred work group size multiple 32 Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a) float 1 / 1
double 0 / 0 (n/a) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest No Round to zero Yes Round to infinity No IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 32, Little-Endian Global memory size 268435456 (256MiB) Error Correction support No Max memory allocation 67108864 (64MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size 32768 Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 8 Max size for 1D images from buffer 16384 pixels Max 1D or 2D image array size 16384 images Max 2D image size 16384x16384 pixels Max 3D image size 0x0x0 pixels Max number of read image args 8 Max number of write image args 1 Local memory type Local Local memory size 4096 (4KiB) Max constant buffer size 1048576 (1024KiB) Max number of constant args 4 Max size of kernel argument 1024 Queue properties
Out-of-order execution Yes Profiling Yes Prefer user sync for interop Yes Profiling timer resolution 1000ns Execution capabilities
Run OpenCL kernels Yes Run native kernels No SPIR versions 1.2 printf() buffer size 65536 (64KiB) Built-in kernels
Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_icd cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_khr_spir cl_img_yuv_image cl_img_mipmap_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptrNULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [IMG] clCreateContext(NULL, ...) [default] Success [IMG] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name PowerVR Rogue Device Name PowerVR Rogue G6230 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name PowerVR Rogue Device Name PowerVR Rogue G6230
ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.9 ICD loader Profile OpenCL 2.1
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258094290
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Your cuda-on-cl invokes EasyCL lib, so it will cause the problem. I edited the EasyCL.cpp: createForIndexedGpu(int gpu, bool verbose)---> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR , 100, device_ids, &num_devices); change to ====> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU , 100, device_ids, &num_devices);
( Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found Trying for OpenCL-enabled CPU ) this error is disappear.
But when I run ./cuda_sample:
~/cuda-on-cl/test/cocl$ ./cuda_sample Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 configureKernel (name=_Z8setValuePfif kernel build error:
kernel source: 1: 2: 3: 4: kernel void _Z8setValuePfif(global float* data, long dataoffset, int idx, float value) { 5: data = (global float)((global char *)data + data_offset); 6: 7: label0:; 8: int v1 = get_localid(0); 9: bool v2 = v1 == 0; 10: if(v2) { 11: goto v4; 12: } else { 13: goto v5; 14: } 15: v4:; 16: long v6 = idx; 17: global float v7 = (&data[v6]); 18: v7[0] = value; 19: goto v5; 20: v5:; 21: return; 22: } 23:
Invalid kernel name, code -46, kernel _Z8setValuePfif
kernel failed to build
kernel name: [_Z8setValuePfif]
saving kernel soucecode to /tmp/failed-kernel.cl
terminate called after throwing an instance of 'std::runtime_error'
what():
kernel source:
1:
2:
3:
4: kernel void _Z8setValuePfif(global float* data, long dataoffset, int idx, float value) {
5: data = (global float)((global char *)data + data_offset);
6:
7: label0:;
8: int v1 = get_localid(0);
9: bool v2 = v1 == 0;
10: if(v2) {
11: goto v4;
12: } else {
13: goto v5;
14: }
15: v4:;
16: long v6 = idx;
17: global float v7 = (&data[v6]);
18: v7[0] = value;
19: goto v5;
20: v5:;
21: return;
22: }
23:
Invalid kernel name, code -46, kernel _Z8setValuePfif
Aborted
error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR , 100, device_ids, &num_devices); change to ====> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU , 100, device_ids, &num_devices);
Ah, interesting! Good information :-)
Invalid kernel name, code -46, kernel _Z8setValuePfif
Ok, after my procrastination, back to the problem that is really odd :-P
We can see there really is a kernel with the name _Z8setValuePfif
, ie line 4 of the sourcecode dump. It's marked kernel
, and the name matches exactly, including exact case match. So, that's odd. But, you say that the easycl_unittests
all (almost all) pass?
The entire message is too long, I put it on attached file. easycl_unittests has similar errors in some cases, I pick up several to paste here:
RUN ] testbuildlog.main createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^
kernel build error:
kernel source: 1: kernel void foo() { 2: someerrorxyz; 3: } 4: 5:
Invalid kernel name, code -46, kernel foo testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^
kernel source: 1: kernel void foo() { 2: someerrorxyz; 3: } 4: 5:
Invalid kernel name, code -46, kernel foo testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^
[ OK ] testbuildlog.main (159 ms)
[ RUN ] testTemplatedKernel.withbuilderror createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 kernel build error:
kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:
Invalid kernel name, code -46, kernel doStuffaaa
caught error: kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:
Invalid kernel name, code -46, kernel doStuffaaa
[ OK ] testTemplatedKernel.withbuilderror (11 ms)
[ RUN ] testTemplatedKernel.withargserror createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 caught error: kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:
Invalid kernel args, code -52 [ OK ] testTemplatedKernel.withargserror (17 ms)
Yeah, those are tests of error handling. Those tests display error messages, but don't actually fail, right? (I'm on my phone, can't read the attachment for some reason. Maybe put it on gist.github.com?)
On 3 November 2016 12:53:51 GMT+00:00, alephman notifications@github.com wrote:
The entire message is too long, I put it on attached file. easycl_unittests has similar errors in some cases, I pick up several to paste here:
RUN ] testbuildlog.main createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^
kernel build error:
kernel source: 1: kernel void foo() { 2: someerrorxyz; 3: } 4: 5:
Invalid kernel name, code -46, kernel foo testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^
kernel source: 1: kernel void foo() { 2: someerrorxyz; 3: } 4: 5:
Invalid kernel name, code -46, kernel foo testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^
[ OK ] testbuildlog.main (159 ms)
[ RUN ] testTemplatedKernel.withbuilderror createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 kernel build error:
kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:
Invalid kernel name, code -46, kernel doStuffaaa
caught error: kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:
Invalid kernel name, code -46, kernel doStuffaaa
[ OK ] testTemplatedKernel.withbuilderror (11 ms)
[ RUN ] testTemplatedKernel.withargserror createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 caught error: kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:
Invalid kernel args, code -52 [ OK ] testTemplatedKernel.withargserror (17 ms)
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258134080
Sent from my Android device with K-9 Mail. Please excuse my brevity.
@hughperkins I paste here: http://pastebin.com/6UzX51Ds
Looks fine. Odd. Let's start with something simple. Do you mind installing python3 and pyopencl?
sudo apt-get install python3 python-virtualenv
virtualenv -p ~/env3
source ~/env3/bin/activate
pip install pyopencl
Then try running eg the following python script:
import numpy as np
import pyopencl as cl
N = 32
its = 3
a = np.random.rand(N).astype(np.float32)
gpu_idx = 0
platforms = cl.get_platforms()
i = 0
for platform in platforms:
gpu_devices = platform.get_devices(device_type=cl.device_type.GPU)
if gpu_idx < i + len(gpu_devices):
ctx = cl.Context(devices=[gpu_devices[gpu_idx - i]])
break
i += len(gpu_devices)
print('context', ctx)
q = cl.CommandQueue(ctx)
mf = cl.mem_flags
a_gpu = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=a)
prg = cl.Program(ctx, """
__kernel void mykernel(global float *data) {
int tid = get_global_id(0);
data[tid] = 123;
}
""").build()
print('run kernel...')
workgroupsize = 32
global_size = ((N + workgroupsize - 1) // workgroupsize) * workgroupsize
for it in range(its):
prg.mykernel(q, (global_size,), (workgroupsize,), a_gpu)
a_res = np.empty_like(a)
cl.enqueue_copy(q, a_res, a_gpu)
q.finish()
print('kernel done')
print('a_res[:5]', a_res[:5])
after setting up the env, I run your python script, it looks good. ~/ python hello.py
('context', <pyopencl.Context at 0xe4ecf0 on <pyopencl.Device 'PowerVR Rogue G6230' on 'PowerVR Rog) run kernel... kernel done ('a_res[:5]', array([ 123., 123., 123., 123., 123.], dtype=float32))
cool. Ok, let's just gradually ratchet it up, till we hit the same error. what if we rename the kernel?
https://gist.github.com/hughperkins/dfe091c2637662c1ef75345c33c64ae6
The kernel's name problem ???
https://gist.github.com/alephman/f049364b0a2aa693b35da0fce380ee1e
python hello1.py
('context', <pyopencl.Context at 0xe4fa80 on <pyopencl.Device 'PowerVR Rogue G6230' on 'PowerVR Rogue' at 0xa7b788>>)
run kernel...
Traceback (most recent call last):
File "hello1.py", line 37, in
The kernel's name problem ???
Seems like it. What if you remove the leading underscore?
I use _z8setvaluepfif function name instead of _Z8setValuePfif, finally the script runs correctly. Change uppercase letters to lowercase letters. But why?
python hello1.py ('context', <pyopencl.Context at 0xe4fa80 on <pyopencl.Device 'PowerVR Rogue G6230' on 'PowerVR Rogue' at 0xa7b788>>) run kernel... kernel done ('a_res[:5]', array([ 123., 123., 123., 123., 123.], dtype=float32))
whoa, weird :-P
I guess that it's recognizing the _Z
name as a c++ mangled name, and it somehow demangles it, or something.
Anyway, we can fix this by modifying the names as you are doing. We probalby need to modify it in two locations:
The opencl code function name can probaby be modified here.
The hostside name can probably be modified here
trying to change the function name as lowercase at two locations you reference, then run ./cocl cuda_sample and ./cuda_sample , get another error output. is there other places to modify ? This two files is complex to me :(
dumping function _Z8setValuePfif dumping function lowerletter _z8setvaluepfif functionName llvm.ptx.read.tid.x patch-hostside cuda_sample-hostraw.ll cuda_sample-device.cl cuda_sample-hostpatched.ll reading rawhost ll file cuda_sample-hostraw.ll reading device cl file cuda_sample-device.cl outputing to patchedhost file cuda_sample-hostpatched.ll getLaunchTypes() got kernel name _Z8setValuePfif lowercase got kernel name _z8setvaluepfif getLaunchTypes() got kernel name _Z8setValuePfif lowercase got kernel name _z8setvaluepfif getLaunchTypes() got kernel name _Z8setValuePfif lowercase got kernel name _z8setvaluepfif
./cuda_sample Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 configureKernel (name=_z8setvaluepfif cuda_sample: tools/intern/llvmufgen/USCInstVisitors.cpp:2179: virtual void llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed. Stack dump:
It might be being cached. I mean, the build artifacts. To make sure everything is up-to-date, I'd try:
make -j 4
sudo make install
make clean-tests
... then retry make run-test-cocl-cuda_sample
If that still dosnt work, you'll need to follow the breadcrumbs around, to check which bits are/arent being updated:
build/test-cuda_sample-device.cl
should contain the updated namebuild/test-cuda_sample-hostpatched.ll
should have a line at the top of the program defining a string, which is the kernel name. It'll be something like:@"s.build/test-cuda_sample-device.cl._Z8setValuePfif" = global [16 x i8] c"_Z8setValuePfif\00"
...ecxept that if it's been renamed, it should be more like:
@"s.build/test-cuda_sample-device.cl._z8setValuePfif" = global [16 x i8] c"_z8setValuePfif\00"
I'm like 80-90% sure it's just those locations.
Actually, I can see that hte hostpatched is correct for you, since hte ocrrect name is arriving into the kernel launch method at runtime:
configureKernel (name=_z8setvaluepfif
llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed.
I'm not really sure what this means. I mean, its some llvm stuff, but doesnt look any of my own llvm stuff. I think it's something in the gpu driver. Hopefully we can fix the bug by going through the steps above, and dont need to figure out what this error message really means.
Following your guide: make clean --> make -j4 --> sudo make install -->make clean-tests -->make run-test-cocl-cuda_sample, but still got the same error.
build/est-cocl-cuda_sample-device.cl: has the updated kernel name( lowercase letters) _z8setValuePfif
build/est-cocl-cuda_sample-hostpatched.ll: the line you pointed out is updated name( lowercase letters) @"s.build/est-cocl-cuda_sample-device.cl._z8setValuePfif" = global [16 x i8] c"_z8setValuePfif\00"
The test-cocl-cuda_sample-device.ll and test-cocl-cuda_sample-hostpatched.ll files in some places still have Uppercase letters kernel name (old name).
test-cocl-cuda_sample-device.cl http://pastebin.ubuntu.com/23425415/ test-cocl-cuda_sample-device.ll http://pastebin.ubuntu.com/23425428/
test-cocl-cuda_sample-hostpatched.ll http://pastebin.ubuntu.com/23425437/
Hmmm, the -device.cl
file looks ok: kernel name has lowercase z
.
The hostpatched.ll
file looks ok: the string has lowercase z
.
Also, if you look at the top of the hostpatched.ll
file, there is a string __opencl_sourcecodebuild/test-cocl-cuda_sample-device.cl
. That contains the openl sourcecode that will be used at runtime. And that has a lowercase z
.
So, I think you've succeeded in changing the case of the z
. So, the error is some other issue... let's try running the same kernel in pyopencl. Can you try running:
import numpy as np
import pyopencl as cl
N = 32
its = 3
a = np.random.rand(N).astype(np.float32)
gpu_idx = 0
platforms = cl.get_platforms()
i = 0
for platform in platforms:
gpu_devices = platform.get_devices(device_type=cl.device_type.GPU)
if gpu_idx < i + len(gpu_devices):
ctx = cl.Context(devices=[gpu_devices[gpu_idx - i]])
break
i += len(gpu_devices)
print('context', ctx)
q = cl.CommandQueue(ctx)
mf = cl.mem_flags
a_gpu = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=a)
prg = cl.Program(ctx, """
kernel void _z8setValuePfif(global float* data, long data_offset, int idx, float value) {
data = (global float*)((global char *)data + data_offset);
label0:;
int v1 = get_local_id(0);
bool v2 = v1 == 0;
if(v2) {
goto v4;
} else {
goto v5;
}
v4:;
long v6 = idx;
global float* v7 = (&data[v6]);
v7[0] = value;
goto v5;
v5:;
return;
}
""").build()
print('run kernel...')
workgroupsize = 32
global_size = ((N + workgroupsize - 1) // workgroupsize) * workgroupsize
for it in range(its):
prg._z8setValuePfif(q, (global_size,), (workgroupsize,), a_gpu, np.int64(0), np.int32(0), np.float32(123))
a_res = np.empty_like(a)
cl.enqueue_copy(q, a_res, a_gpu)
q.finish()
print('kernel done')
print('a_res[:5]', a_res[:5])
the result like this: http://pastebin.ubuntu.com/23425881/
It looks good... The good news is, it seems to run just fine on your gpu. The bad news is, that means I have no idea why the it doesnt run from within the cuda_sample executable :-P But clearly it could, if we can tweak it slightly. As to how to.... hmmm... Lets switch to running our tests in c++. Can you build/compile the following opencl c++ code please?
https://gist.github.com/hughperkins/72c9288daf7feb0105a06b1b6b875d57
(There are two files. Create them both in the same directory, then do bash runcoclissue3a.sh
, and paste the output)
Add -fPIC compiler parameter for aarch64. The result still looks good:(...
mkdir -p build clang++-3.8 -fPIC -c -o build/cocl-issue3-a.o cocl-issue3-a.cpp -std=c++11 g++ -fPIC -o build/cocl-issue3-a build/cocl-issue3-a.o -lOpenCL build/cocl-issue3-a queued kernel ok clfinish finished ok a[0]=555 a[1]=555 a[2]=123 a[3]=555 a[4]=555
There is a bit weird here: you provided first python code version: http://pastebin.ubuntu.com/23429922/ . the result's number is same every time: 'context', <pyopencl.Context at 0xe4fd90 on <pyopencl.Device 'PowerVR Rogue G6230' on 'PowerVR Rogue' at 0xa7b788>>) run kernel... kernel done ('a_res[:5]', array([ 123., 123., 123., 123., 123.], dtype=float32))
but the second version http://pastebin.ubuntu.com/23429949/ the result number is random, but always float type:
('context', <pyopencl.Context at 0xe4ff90 on <pyopencl.Device 'PowerVR Rogue G6230' on 'PowerVR Rogue' at 0x7e7688>>) run kernel... kernel done ('a_res[:5]', array([ 1.23000000e+02, 6.69739768e-02, 9.66201425e-01, 6.00455105e-01, 5.36456466e-01], dtype=float32))
just slightly different version of the code. this version has random, but always same random: https://gist.github.com/hughperkins/6d9d010225f8c9a5e28a7526c7063e78
The result still looks good
Cool. Ok, let's switch to EasyCL:
https://gist.github.com/hughperkins/d1d209c08c47449ce6134fce777c982b
Hi hughperkins, The steps I did like this:
modify the files: cocl-issue3-a.cpp file: change-->
to -->
run.sh file:
set -x set -e
mkdir -p build clang++-3.8 -DUSE_CLEW -fPIC -I/home/linaro/EasyCL/dist/include -c -o build/cocl-issue3-a.o cocl-issue3-a.cpp -std=c++11 g++ -fPIC -pie -Wl,-rpath,/home/linaro/EasyCL/dist/lib -o build/cocl-issue3-a build/cocl-issue3-a.o -lEasyCL -lclew build/cocl-issue3-a
this result is good:
set -e COCL_HOME=/home/linaro/cuda-on-cl mkdir -p build clang++-3.8 -DUSE_CLEW -fPIC -I/home/linaro/EasyCL/dist/include -c -o build/cocl-issue3-a.o cocl-issue3-a.cpp -std=c++11 g++ -fPIC -pie -Wl,-rpath,/home/linaro/EasyCL/dist/lib -o build/cocl-issue3-a build/cocl-issue3-a.o -lEasyCL -lclew build/cocl-issue3-a createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558604bb90 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 clfinish finished ok a[0]=555 a[1]=555 a[2]=123 a[3]=555 a[4]=555
Alright. Let's start linking with libcocl.so
:
https://gist.github.com/hughperkins/61d43161e937b48271572f6cc9db78a3
1) trying to compile the c++ code, but get an error that undefine the ThreadVars class. I just realized that my code is out of date. I git pull the code up to date, found that new code use cmake instead of Makefile. so I try to do like this: patch ir-to-opencl.cpp and patch-hostside.cpp for transform kernel name uppercase to lowercase. mkdir -p build cd build cmake .. make -j4 // this step is really faster than old, cool! sudo make install
cd ../test/cocl cocl -fPIC cuda_sample.cu ./cuda_sample
it seems the patch doesn't work, I still get build kernel failed( the kernel name doesn't be changed) http://pastebin.ubuntu.com/23434196/
2) trying to compile the cpp test file again, get an error like this:
^ 1 error generated.
I git pull the code up to date, found that new code use cmake instead of Makefile
:-P
fatal error: 'EasyCL.h' file not found #include "EasyCL.h" ^ 1 error generated.
Builds ok for me. Can you remove both your build
directories, ie the one under cuda-on-cl, and the one under the test file, and rebuild, and try again? You probably want to sudo make uninstall
from cuda-on-cl build
directory first, and check manually that /usr/local
is all clean. ie make sure the following files/directories dont eixst:
cocl
at the end. ie from cuda-on-cl build directory:make -j 4 && sudo make install
By the way, the first line of mycocl-issue3-c.cpp
looks like:
#include "EasyCL/EasyCL.h"
... ie, inside EasyCL
thanks. I removed all files/folders you listed manully, then try to compile it again,but get an error. I m sure the code is up to date.
/home/linaro/cuda-on-cl-bak1/src/ir-to-opencl.cpp:22:31: fatal error: cocl/local_config.h: No such file or directory
I search the local_config.h file, it doesn't exist in this project.
Oh. I need to fix that. There should be a file like 'local_config.h.templ'. can you manually copy that as local_config.h, and put it somewhere includable? (Probably in /usr/local/include/cocl)
On 6 November 2016 09:11:24 GMT+00:00, alephman notifications@github.com wrote:
thanks. I removed all files/folders you listed manully, then try to compile it again,but get an error. I m sure the code is up to date.
/home/linaro/cuda-on-cl-bak1/src/ir-to-opencl.cpp:22:31: fatal error: cocl/local_config.h: No such file or directory
include "cocl/local_config.h"
I search the local_config.h file, it doesn't exist in this project.
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258668743
Sent from my Android device with K-9 Mail. Please excuse my brevity.
cuda-on-cl compiling is successful, bu I run your test cpp file, still get an error:
clblast::CacheClearAll()' //usr/local/lib/libcocl.so: undefined reference to
CLBlastSgemm'
collect2: error: ld returned 1 exit statusCool. For the undefined refernece
issue, can you try adding -lclblast
to the g++
commandline, just after -lcocl
, please?
Thanks! 1) it sounds good.
2) miss copying cocl to /usr/local/bin when sudo make install: cp bin/cocl /usr/local/bin cocl -fPIC cuda_sample.cu
make: /usr/local/bin/../share/cocl/cocl.Makefile: No such file or directory make: *\ No rule to make target '/usr/local/bin/../share/cocl/cocl.Makefile'. Stop.
3) I try to enable run-test, so I run this commands like this: cmake .. -DBUILD_TESTS=ON make -j4
get an error like this: http://pastebin.ubuntu.com/23435852/
miss copying cocl to /usr/local/bin when sudo make install:
Good point! Addressed in 2fc6a8a
fatal error: 'sys/cdefs.h' file not found
I had the same error on Ubuntu. I never quite understood it, but fixed it by doing something like:
sudo apt-get install libc6-dev:i386
(see for example:
$ cocl -fPIC cuda_sample.cu /usr/local/share/cocl/cocl.Makefile: No such file or directory?? there only have a cocl.cmake file in /usr/local/share/cocl folder.
Fixed it in latest master https://github.com/hughperkins/cuda-on-cl/commit/ac9ce6474bd43a625c0995ce63341b69198d62f7 . I'm building a dockerfile so I can check the build on a clean machine :-) https://github.com/hughperkins/cuda-on-cl/blob/master/docker/Dockerfile
Sounds very cool! How does the docker support GPU?
It doesnt. Unless you use eg nvidia-docker. Each vendor/driver would need to supply their own solution. But at lesat it means I can find the various install script bugs, instead of waiting for other people to find htem for me :-)
the test program you provided (https://gist.github.com/hughperkins/61d43161e937b48271572f6cc9db78a3) still good.
but I still get the error when I run cocl -fPIC cuda_sample.cu:
building kernel _z8setvaluepfif cuda_sample: tools/intern/llvmufgen/USCInstVisitors.cpp:2179: virtual void llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed. Stack dump:
Can't find anything relatived information by google:(
the test program you provided (https://gist.github.com/hughperkins/61d43161e937b48271572f6cc9db78a3) still good.
Cool. Let's keep going :-)
Let's use cocl::Memory to do the device allocation:
https://gist.github.com/hughperkins/08bb351022e0ed3dcff251c5a1151b6e
oh wait...
My board info: arm64 debian(jessie) , GCC 6.2, LLVM 3.8.
g++ -o build/test-cocl-cuda_sample build/test-cocl-cuda_sample.o -g -lcocl -lOpenCL /usr/bin/ld: build/test-cocl-cuda_sample.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external C /usr/bin/ld: build/test-cocl-cuda_sample.o(.text+0xe8): unresolvable R_AARCH64_ADR_PREL_PG_HI21 rel' /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status Makefile:128: recipe for target 'build/test-cocl-cuda_sample' failed make: *\ [build/test-cocl-cuda_sample] Error 1