hughperkins / coriander

Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices
Apache License 2.0
842 stars 88 forks source link

porting on aarch64 #3

Closed alephman closed 7 years ago

alephman commented 8 years ago

My board info: arm64 debian(jessie) , GCC 6.2, LLVM 3.8.

  1. this step is success. git clone --recursive https://github.com/hughperkins/cuda-on-cl cd cuda-on-cl make sudo make install
  2. make run-test-cocl-cuda_sample

g++ -o build/test-cocl-cuda_sample build/test-cocl-cuda_sample.o -g -lcocl -lOpenCL /usr/bin/ld: build/test-cocl-cuda_sample.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external C /usr/bin/ld: build/test-cocl-cuda_sample.o(.text+0xe8): unresolvable R_AARCH64_ADR_PREL_PG_HI21 rel' /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status Makefile:128: recipe for target 'build/test-cocl-cuda_sample' failed make: *\ [build/test-cocl-cuda_sample] Error 1

alephman commented 8 years ago

@hughperkins I use aarch64 instead of x86 in Makefile, but still can't be passed when I run "make run-tests". I am not familiar with OpenCI and cuda, is there any idea or direction to debug it?

g++ -fPIC  -o build/test-cocl-cuda_sample build/test-cocl-cuda_sample.o -g -lcocl -lOpenCL
/usr/bin/ld: build/test-cocl-cuda_sample.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external C
/usr/bin/ld: build/test-cocl-cuda_sample.o(.text+0xe8): unresolvable R_AARCH64_ADR_PREL_PG_HI21 rel'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
Makefile:128: recipe for target 'build/test-cocl-cuda_sample' failed
make: **\* [build/test-cocl-cuda_sample] Error 1
alephman commented 8 years ago

after adding -fPIC paramter, it is compiled successfully.

$ ./cuda_sample

Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found Trying for OpenCL-enabled CPU Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 configureKernel (name=_Z8setValuePfif kernel build error:

kernel source: 1: 2: 3: 4: kernel void _Z8setValuePfif(global float* data, long dataoffset, int idx, float value) { 5: data = (global float)((global char *)data + data_offset); 6: 7: label0:; 8: int v1 = get_localid(0); 9: bool v2 = v1 == 0; 10: if(v2) { 11: goto v4; 12: } else { 13: goto v5; 14: } 15: v4:; 16: long v6 = idx; 17: global float v7 = (&data[v6]); 18: v7[0] = value; 19: goto v5; 20: v5:; 21: return; 22: } 23:

Invalid kernel name, code -46, kernel _Z8setValuePfif

kernel failed to build kernel name: [_Z8setValuePfif] saving kernel soucecode to /tmp/failed-kernel.cl terminate called after throwing an instance of 'std::runtime_error' what():
kernel source: 1: 2: 3: 4: kernel void _Z8setValuePfif(global float* data, long dataoffset, int idx, float value) { 5: data = (global float)((global char *)data + data_offset); 6: 7: label0:; 8: int v1 = get_localid(0); 9: bool v2 = v1 == 0; 10: if(v2) { 11: goto v4; 12: } else { 13: goto v5; 14: } 15: v4:; 16: long v6 = idx; 17: global float v7 = (&data[v6]); 18: v7[0] = value; 19: goto v5; 20: v5:; 21: return; 22: } 23:

Invalid kernel name, code -46, kernel _Z8setValuePfif

Aborted

hughperkins commented 8 years ago

after adding -fPIC paramter, it is compiled successfully.

Nice! :-)

Invalid kernel name, code -46, kernel _Z8setValuePfif

Thats a weird error. But before we dig into that, note that it looks like your OpenCL device is a CPU, rather than a GPU? :

Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU

Is this what you expect?

alephman commented 8 years ago

1) What does "OpenCL-enabled CPU" mean? I use gpuinfo and clinfo to list GPU info, it looks good. Or Is there another simple test program to help me to check that?

2) I try runing EasyCL's easycl_unittests:

args: ./easycl_unittests --gtest_filter=-SLOW Note: Google Test filter = -SLOW [==========] Running 63 tests from 25 test cases. [----------] Global test environment set-up. [----------] 1 test from testscalars [ RUN ] testscalars.test1 found opencl library Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found Trying for OpenCL-enabled CPU Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 ..... .....

time: 720.502 [ OK ] teststatefultimer.notiming (1131 ms) [----------] 2 tests from teststatefultimer (2325 ms total)

[----------] Global test environment tear-down [==========] 63 tests from 25 test cases ran. (11010 ms total) [ PASSED ] 58 tests. [ FAILED ] 5 tests, listed below: [ FAILED ] testscalars.test1 [ FAILED ] testnewinstantiations.createForFirstGpu [ FAILED ] testnewinstantiations.createForIndexedGpu [ FAILED ] testDeviceInfo.basic [ FAILED ] testDeviceInfo.gpus

hughperkins commented 8 years ago

Hmmm, certainly seems gpu-like https://imgtec.com/powervr/graphics/series6/

Generally speaking, when one runs clinfo, it will say for each device if it's a 'CPU' or a 'GPU'. Some differences between 'CPU' and 'GPU':

Things that work well on a GPU:

Things that work well on a CPU:

The issue with running GPU things on a CPU, is the GPU programs expect to find hundreds of cores, runningin waprs of 32, and crash to a halt if they find themselves running on a device iwht just 1-4 cores. For example, sorting and reduction might be strongly reliant on having large-ish warps.

If your device really is a 'GPU', which it appears from my link to be, it's a bit mysterious why it's reporting itself as a CPU. One thing to be careful about: eg Intel CPUs contain a GPU. If you dont have the right drivers installed, on an intel cpu, you'll see the 'CPU' bit of the cpu, as a CPU opencl device, and the 'GPU' bit wont show up. Even though they're both on the same chip, and it's a 'cpu' chip, theres a fair bit of difference in behavior between these two parts of the same device. Is it possible that the device you are using has two parts, CPU and GPU, and the drivers are currently only showing hte 'CPU' part?

alephman commented 8 years ago

Thanks a lot for your patient, hughperkins ! 1) It's a ARM arm64 4 cores CPU+ Imagination's PowerVR Rogue G6230 GPU, support OpenCL 1.2. I think this ARM chip is similar to Intel CPU contains a GPU, all in one chip. So can I understand that "The OpenCL-enabled CPU" is a CPU built-in GPU core inside?

I dig into the code, It get error at EasyCL.cpp: createForIndexedGpu(int gpu, bool verbose)---> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR , 100, device_ids, &num_devices); The returned error value is -31

However, the same API is OK in gpuinfo.cpp: error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, 0, 0, &num_devices); cl_device_id *device_ids = new cl_device_id[num_devices]; error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, num_devices, device_ids, &num_devices);

It seems the clGetDeviceIDs functions's parameters is different. But I can't understand why it failed ?

hughperkins commented 8 years ago

Ok. Sounds like you have the CPU drivers installed for opencl, and you need to install somehow drivers for the GPU too.

Basically, when you run clinfo there needs to be at least one device with type 'gpu'. Seems like you should see total 2 devices: one CPU, and one GPU?

On 3 November 2016 01:48:37 GMT+00:00, alephman notifications@github.com wrote:

Thanks a lot for your patient, hughperkins ! 1) It's a ARM arm64 4 cores CPU+ Imagination's PowerVR Rogue G6230 GPU, support OpenCL 1.2. I think this ARM chip is similar to Intel CPU contains a GPU, all in one chip. So can I understand that "The OpenCL-enabled CPU" is a CPU built-in GPU core inside?

I dig into the code, It get error at EasyCL.cpp: createForIndexedGpu(int gpu, bool verbose)---> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR , 100, device_ids, &num_devices); The returned error value is -31

However, the same API is OK in gpuinfo.cpp: error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, 0, 0, &num_devices); cl_device_id *device_ids = new cl_device_id[num_devices]; error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, num_devices, device_ids, &num_devices);

It seems the clGetDeviceIDs functions's parameters is different. But I can't understand why it failed ?

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258048882

Sent from my Android device with K-9 Mail. Please excuse my brevity.

hughperkins commented 8 years ago

Can we get the full output of clinfo, and see what that says?

On 3 November 2016 01:48:37 GMT+00:00, alephman notifications@github.com wrote:

Thanks a lot for your patient, hughperkins ! 1) It's a ARM arm64 4 cores CPU+ Imagination's PowerVR Rogue G6230 GPU, support OpenCL 1.2. I think this ARM chip is similar to Intel CPU contains a GPU, all in one chip. So can I understand that "The OpenCL-enabled CPU" is a CPU built-in GPU core inside?

I dig into the code, It get error at EasyCL.cpp: createForIndexedGpu(int gpu, bool verbose)---> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR , 100, device_ids, &num_devices); The returned error value is -31

However, the same API is OK in gpuinfo.cpp: error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, 0, 0, &num_devices); cl_device_id *device_ids = new cl_device_id[num_devices]; error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, num_devices, device_ids, &num_devices);

It seems the clGetDeviceIDs functions's parameters is different. But I can't understand why it failed ?

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258048882

Sent from my Android device with K-9 Mail. Please excuse my brevity.

alephman commented 8 years ago

Sorry for delaying relay. it seems only One GPU device:

./clinfo:

Number of platforms 1 Platform Name PowerVR Rogue Platform Vendor Imagination Technologies Platform Version OpenCL 1.2 Platform Profile EMBEDDED_PROFILE Platform Extensions cl_khr_icd cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_khr_spir cl_img_yuv_image cl_img_mipmap_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptr Platform Extensions function suffix IMG

Platform Name PowerVR Rogue Number of devices 1 Device Name PowerVR Rogue G6230 Device Vendor Imagination Technologies Device Vendor ID 0x1 Device Version OpenCL 1.2 Driver Version 1.5@3830101 Device OpenCL C Version OpenCL C 1.2 Device Type GPU Device Profile EMBEDDED_PROFILE Max compute units 2 Max clock frequency 352MHz Device Partition (core) Max number of sub-devices 1 Supported partition types none specified Max work item dimensions 3 Max work item sizes 512x512x512 Max work group size 512 Preferred work group size multiple 32 Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a) float 1 / 1
double 0 / 0 (n/a) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest No Round to zero Yes Round to infinity No IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 32, Little-Endian Global memory size 268435456 (256MiB) Error Correction support No Max memory allocation 67108864 (64MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size 32768 Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 8 Max size for 1D images from buffer 16384 pixels Max 1D or 2D image array size 16384 images Max 2D image size 16384x16384 pixels Max 3D image size 0x0x0 pixels Max number of read image args 8 Max number of write image args 1 Local memory type Local Local memory size 4096 (4KiB) Max constant buffer size 1048576 (1024KiB) Max number of constant args 4 Max size of kernel argument 1024 Queue properties
Out-of-order execution Yes Profiling Yes Prefer user sync for interop Yes Profiling timer resolution 1000ns Execution capabilities
Run OpenCL kernels Yes Run native kernels No SPIR versions 1.2 printf() buffer size 65536 (64KiB) Built-in kernels
Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_icd cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_khr_spir cl_img_yuv_image cl_img_mipmap_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptr

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [IMG] clCreateContext(NULL, ...) [default] Success [IMG] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name PowerVR Rogue Device Name PowerVR Rogue G6230 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name PowerVR Rogue Device Name PowerVR Rogue G6230

ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.9 ICD loader Profile OpenCL 2.1

hughperkins commented 8 years ago

Oh yeah, that's exactly what I'd hope to see: it has device type 'gpu', which is what we want. But when you run cuda_sample, it says "can't find GPU, falling back on CPU"?

On 3 November 2016 09:21:03 GMT+00:00, alephman notifications@github.com wrote:

Sorry for delaying relay. it seems only One GPU device:

./clinfo:

Number of platforms 1 Platform Name PowerVR Rogue Platform Vendor Imagination Technologies Platform Version OpenCL 1.2 Platform Profile EMBEDDED_PROFILE Platform Extensions cl_khr_icd cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_khr_spir cl_img_yuv_image cl_img_mipmap_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptr Platform Extensions function suffix IMG

Platform Name PowerVR Rogue Number of devices 1 Device Name PowerVR Rogue G6230 Device Vendor Imagination Technologies Device Vendor ID 0x1 Device Version OpenCL 1.2 Driver Version 1.5@3830101 Device OpenCL C Version OpenCL C 1.2 Device Type GPU Device Profile EMBEDDED_PROFILE Max compute units 2 Max clock frequency 352MHz Device Partition (core) Max number of sub-devices 1 Supported partition types none specified Max work item dimensions 3 Max work item sizes 512x512x512 Max work group size 512 Preferred work group size multiple 32 Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a) float 1 / 1
double 0 / 0 (n/a) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest No Round to zero Yes Round to infinity No IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 32, Little-Endian Global memory size 268435456 (256MiB) Error Correction support No Max memory allocation 67108864 (64MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size 32768 Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 8 Max size for 1D images from buffer 16384 pixels Max 1D or 2D image array size 16384 images Max 2D image size 16384x16384 pixels Max 3D image size 0x0x0 pixels Max number of read image args 8 Max number of write image args 1 Local memory type Local Local memory size 4096 (4KiB) Max constant buffer size 1048576 (1024KiB) Max number of constant args 4 Max size of kernel argument 1024 Queue properties
Out-of-order execution Yes Profiling Yes Prefer user sync for interop Yes Profiling timer resolution 1000ns Execution capabilities
Run OpenCL kernels Yes Run native kernels No SPIR versions 1.2 printf() buffer size 65536 (64KiB) Built-in kernels
Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_icd cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_khr_spir cl_img_yuv_image cl_img_mipmap_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptr

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [IMG] clCreateContext(NULL, ...) [default] Success [IMG] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name PowerVR Rogue Device Name PowerVR Rogue G6230 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name PowerVR Rogue Device Name PowerVR Rogue G6230

ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.9 ICD loader Profile OpenCL 2.1

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258094290

Sent from my Android device with K-9 Mail. Please excuse my brevity.

alephman commented 8 years ago

Your cuda-on-cl invokes EasyCL lib, so it will cause the problem. I edited the EasyCL.cpp: createForIndexedGpu(int gpu, bool verbose)---> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR , 100, device_ids, &num_devices); change to ====> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU , 100, device_ids, &num_devices);

( Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found Trying for OpenCL-enabled CPU ) this error is disappear.

But when I run ./cuda_sample:

~/cuda-on-cl/test/cocl$ ./cuda_sample Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 configureKernel (name=_Z8setValuePfif kernel build error:

kernel source: 1: 2: 3: 4: kernel void _Z8setValuePfif(global float* data, long dataoffset, int idx, float value) { 5: data = (global float)((global char *)data + data_offset); 6: 7: label0:; 8: int v1 = get_localid(0); 9: bool v2 = v1 == 0; 10: if(v2) { 11: goto v4; 12: } else { 13: goto v5; 14: } 15: v4:; 16: long v6 = idx; 17: global float v7 = (&data[v6]); 18: v7[0] = value; 19: goto v5; 20: v5:; 21: return; 22: } 23:

Invalid kernel name, code -46, kernel _Z8setValuePfif

kernel failed to build kernel name: [_Z8setValuePfif] saving kernel soucecode to /tmp/failed-kernel.cl terminate called after throwing an instance of 'std::runtime_error' what():
kernel source: 1: 2: 3: 4: kernel void _Z8setValuePfif(global float* data, long dataoffset, int idx, float value) { 5: data = (global float)((global char *)data + data_offset); 6: 7: label0:; 8: int v1 = get_localid(0); 9: bool v2 = v1 == 0; 10: if(v2) { 11: goto v4; 12: } else { 13: goto v5; 14: } 15: v4:; 16: long v6 = idx; 17: global float v7 = (&data[v6]); 18: v7[0] = value; 19: goto v5; 20: v5:; 21: return; 22: } 23:

Invalid kernel name, code -46, kernel _Z8setValuePfif

Aborted

hughperkins commented 8 years ago

error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR , 100, device_ids, &num_devices); change to ====> error = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU , 100, device_ids, &num_devices);

Ah, interesting! Good information :-)

Invalid kernel name, code -46, kernel _Z8setValuePfif

Ok, after my procrastination, back to the problem that is really odd :-P

We can see there really is a kernel with the name _Z8setValuePfif, ie line 4 of the sourcecode dump. It's marked kernel, and the name matches exactly, including exact case match. So, that's odd. But, you say that the easycl_unittests all (almost all) pass?

alephman commented 8 years ago

The entire message is too long, I put it on attached file. easycl_unittests has similar errors in some cases, I pick up several to paste here:

RUN ] testbuildlog.main createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^

kernel build error:

kernel source: 1: kernel void foo() { 2: someerrorxyz; 3: } 4: 5:

Invalid kernel name, code -46, kernel foo testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^

kernel source: 1: kernel void foo() { 2: someerrorxyz; 3: } 4: 5:

Invalid kernel name, code -46, kernel foo testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^

[ OK ] testbuildlog.main (159 ms)

[ RUN ] testTemplatedKernel.withbuilderror createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 kernel build error:

kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:

Invalid kernel name, code -46, kernel doStuffaaa

caught error: kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:

Invalid kernel name, code -46, kernel doStuffaaa

[ OK ] testTemplatedKernel.withbuilderror (11 ms)

[ RUN ] testTemplatedKernel.withargserror createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 caught error: kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:

Invalid kernel args, code -52 [ OK ] testTemplatedKernel.withargserror (17 ms)

log.txt

hughperkins commented 8 years ago

Yeah, those are tests of error handling. Those tests display error messages, but don't actually fail, right? (I'm on my phone, can't read the attachment for some reason. Maybe put it on gist.github.com?)

On 3 November 2016 12:53:51 GMT+00:00, alephman notifications@github.com wrote:

The entire message is too long, I put it on attached file. easycl_unittests has similar errors in some cases, I pick up several to paste here:

RUN ] testbuildlog.main createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^

kernel build error:

kernel source: 1: kernel void foo() { 2: someerrorxyz; 3: } 4: 5:

Invalid kernel name, code -46, kernel foo testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^

kernel source: 1: kernel void foo() { 2: someerrorxyz; 3: } 4: 5:

Invalid kernel name, code -46, kernel foo testbuildlog.cl build log: BuildGroup_21:2:4: error: use of undeclared identifier 'someerrorxyz' someerrorxyz; ^

[ OK ] testbuildlog.main (159 ms)

[ RUN ] testTemplatedKernel.withbuilderror createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 kernel build error:

kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:

Invalid kernel name, code -46, kernel doStuffaaa

caught error: kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:

Invalid kernel name, code -46, kernel doStuffaaa

[ OK ] testTemplatedKernel.withbuilderror (11 ms)

[ RUN ] testTemplatedKernel.withargserror createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558fea5940 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 caught error: kernel source: 1: kernel void doStuff(int N, global int out, global const int in) { 2: int globalId = get_global_id(0); 3: if(globalId < N) { 4: int value = in[globalId]; 5: out[globalId] = value; 6: } 7: } 8:

Invalid kernel args, code -52 [ OK ] testTemplatedKernel.withargserror (17 ms)

log.txt

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258134080

Sent from my Android device with K-9 Mail. Please excuse my brevity.

alephman commented 8 years ago

@hughperkins I paste here: http://pastebin.com/6UzX51Ds

hughperkins commented 8 years ago

Looks fine. Odd. Let's start with something simple. Do you mind installing python3 and pyopencl?

sudo apt-get install python3 python-virtualenv
virtualenv -p ~/env3
source ~/env3/bin/activate
pip install pyopencl

Then try running eg the following python script:

import numpy as np
import pyopencl as cl

N = 32
its = 3

a = np.random.rand(N).astype(np.float32)

gpu_idx = 0

platforms = cl.get_platforms()
i = 0
for platform in platforms:
    gpu_devices = platform.get_devices(device_type=cl.device_type.GPU)
    if gpu_idx < i + len(gpu_devices):
        ctx = cl.Context(devices=[gpu_devices[gpu_idx - i]])
        break
    i += len(gpu_devices)

print('context', ctx)
q = cl.CommandQueue(ctx)

mf = cl.mem_flags
a_gpu = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=a)

prg = cl.Program(ctx, """
__kernel void mykernel(global float *data) {
    int tid = get_global_id(0);
    data[tid] = 123;
}
""").build()

print('run kernel...')
workgroupsize = 32
global_size = ((N + workgroupsize - 1) // workgroupsize) * workgroupsize
for it in range(its):
    prg.mykernel(q, (global_size,), (workgroupsize,), a_gpu)

a_res = np.empty_like(a)
cl.enqueue_copy(q, a_res, a_gpu)

q.finish()
print('kernel done')

print('a_res[:5]', a_res[:5])
alephman commented 8 years ago

after setting up the env, I run your python script, it looks good. ~/ python hello.py

('context', <pyopencl.Context at 0xe4ecf0 on <pyopencl.Device 'PowerVR Rogue G6230' on 'PowerVR Rog) run kernel... kernel done ('a_res[:5]', array([ 123., 123., 123., 123., 123.], dtype=float32))

hughperkins commented 8 years ago

cool. Ok, let's just gradually ratchet it up, till we hit the same error. what if we rename the kernel?

https://gist.github.com/hughperkins/dfe091c2637662c1ef75345c33c64ae6

alephman commented 8 years ago

The kernel's name problem ???

https://gist.github.com/alephman/f049364b0a2aa693b35da0fce380ee1e

python hello1.py ('context', <pyopencl.Context at 0xe4fa80 on <pyopencl.Device 'PowerVR Rogue G6230' on 'PowerVR Rogue' at 0xa7b788>>) run kernel... Traceback (most recent call last): File "hello1.py", line 37, in prg._Z8setValuePfif(q, (global_size,), (workgroupsize,), a_gpu) File "/usr/local/lib/python2.7/dist-packages/pyopencl/init.py", line 321, in getattr "info attribute or as a kernel name" % attr) AttributeError: '_Z8setValuePfif' was not found as a program info attribute or as a kernel name

hughperkins commented 8 years ago

The kernel's name problem ???

Seems like it. What if you remove the leading underscore?

alephman commented 8 years ago

I use _z8setvaluepfif function name instead of _Z8setValuePfif, finally the script runs correctly. Change uppercase letters to lowercase letters. But why?

python hello1.py ('context', <pyopencl.Context at 0xe4fa80 on <pyopencl.Device 'PowerVR Rogue G6230' on 'PowerVR Rogue' at 0xa7b788>>) run kernel... kernel done ('a_res[:5]', array([ 123., 123., 123., 123., 123.], dtype=float32))

hughperkins commented 8 years ago

whoa, weird :-P

I guess that it's recognizing the _Z name as a c++ mangled name, and it somehow demangles it, or something.

Anyway, we can fix this by modifying the names as you are doing. We probalby need to modify it in two locations:

The opencl code function name can probaby be modified here.

The hostside name can probably be modified here

alephman commented 8 years ago

trying to change the function name as lowercase at two locations you reference, then run ./cocl cuda_sample and ./cuda_sample , get another error output. is there other places to modify ? This two files is complex to me :(

dumping function _Z8setValuePfif dumping function lowerletter _z8setvaluepfif functionName llvm.ptx.read.tid.x patch-hostside cuda_sample-hostraw.ll cuda_sample-device.cl cuda_sample-hostpatched.ll reading rawhost ll file cuda_sample-hostraw.ll reading device cl file cuda_sample-device.cl outputing to patchedhost file cuda_sample-hostpatched.ll getLaunchTypes() got kernel name _Z8setValuePfif lowercase got kernel name _z8setvaluepfif getLaunchTypes() got kernel name _Z8setValuePfif lowercase got kernel name _z8setvaluepfif getLaunchTypes() got kernel name _Z8setValuePfif lowercase got kernel name _z8setvaluepfif

clang++-3.8 -fPIC -c cuda_sample-hostpatched.ll -O3 -o cuda_sample.o

./cuda_sample Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 configureKernel (name=_z8setvaluepfif cuda_sample: tools/intern/llvmufgen/USCInstVisitors.cpp:2179: virtual void llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed. Stack dump:

  1. Running pass 'UniFlex generator' on module 'BuildGroup_1'. Aborted
hughperkins commented 8 years ago

It might be being cached. I mean, the build artifacts. To make sure everything is up-to-date, I'd try:

make -j 4
sudo make install
make clean-tests

... then retry make run-test-cocl-cuda_sample

If that still dosnt work, you'll need to follow the breadcrumbs around, to check which bits are/arent being updated:

@"s.build/test-cuda_sample-device.cl._Z8setValuePfif" = global [16 x i8] c"_Z8setValuePfif\00"

...ecxept that if it's been renamed, it should be more like:

@"s.build/test-cuda_sample-device.cl._z8setValuePfif" = global [16 x i8] c"_z8setValuePfif\00"

I'm like 80-90% sure it's just those locations.

Actually, I can see that hte hostpatched is correct for you, since hte ocrrect name is arriving into the kernel launch method at runtime:

configureKernel (name=_z8setvaluepfif

llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed.

I'm not really sure what this means. I mean, its some llvm stuff, but doesnt look any of my own llvm stuff. I think it's something in the gpu driver. Hopefully we can fix the bug by going through the steps above, and dont need to figure out what this error message really means.

alephman commented 8 years ago

Following your guide: make clean --> make -j4 --> sudo make install -->make clean-tests -->make run-test-cocl-cuda_sample, but still got the same error.

build/est-cocl-cuda_sample-device.cl: has the updated kernel name( lowercase letters) _z8setValuePfif

build/est-cocl-cuda_sample-hostpatched.ll: the line you pointed out is updated name( lowercase letters) @"s.build/est-cocl-cuda_sample-device.cl._z8setValuePfif" = global [16 x i8] c"_z8setValuePfif\00"

The test-cocl-cuda_sample-device.ll and test-cocl-cuda_sample-hostpatched.ll files in some places still have Uppercase letters kernel name (old name).

test-cocl-cuda_sample-device.cl http://pastebin.ubuntu.com/23425415/ test-cocl-cuda_sample-device.ll http://pastebin.ubuntu.com/23425428/

test-cocl-cuda_sample-hostpatched.ll http://pastebin.ubuntu.com/23425437/

hughperkins commented 8 years ago

Hmmm, the -device.cl file looks ok: kernel name has lowercase z.

The hostpatched.ll file looks ok: the string has lowercase z.

Also, if you look at the top of the hostpatched.ll file, there is a string __opencl_sourcecodebuild/test-cocl-cuda_sample-device.cl. That contains the openl sourcecode that will be used at runtime. And that has a lowercase z.

So, I think you've succeeded in changing the case of the z. So, the error is some other issue... let's try running the same kernel in pyopencl. Can you try running:

import numpy as np
import pyopencl as cl

N = 32
its = 3

a = np.random.rand(N).astype(np.float32)

gpu_idx = 0

platforms = cl.get_platforms()
i = 0
for platform in platforms:
    gpu_devices = platform.get_devices(device_type=cl.device_type.GPU)
    if gpu_idx < i + len(gpu_devices):
        ctx = cl.Context(devices=[gpu_devices[gpu_idx - i]])
        break
    i += len(gpu_devices)

print('context', ctx)
q = cl.CommandQueue(ctx)

mf = cl.mem_flags
a_gpu = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=a)

prg = cl.Program(ctx, """
kernel void _z8setValuePfif(global float* data, long data_offset, int idx, float value) {
   data = (global float*)((global char *)data + data_offset);

   label0:;
   int v1 = get_local_id(0);
    bool v2 = v1 == 0;
    if(v2) {
        goto v4;
    } else {
        goto v5;
    }
    v4:;
    long v6 = idx;
    global float* v7 = (&data[v6]);
    v7[0] = value;
        goto v5;
    v5:;
    return;
}
""").build()

print('run kernel...')
workgroupsize = 32
global_size = ((N + workgroupsize - 1) // workgroupsize) * workgroupsize
for it in range(its):
    prg._z8setValuePfif(q, (global_size,), (workgroupsize,), a_gpu, np.int64(0), np.int32(0), np.float32(123))

a_res = np.empty_like(a)
cl.enqueue_copy(q, a_res, a_gpu)

q.finish()
print('kernel done')

print('a_res[:5]', a_res[:5])
alephman commented 8 years ago

the result like this: http://pastebin.ubuntu.com/23425881/

hughperkins commented 8 years ago

It looks good... The good news is, it seems to run just fine on your gpu. The bad news is, that means I have no idea why the it doesnt run from within the cuda_sample executable :-P But clearly it could, if we can tweak it slightly. As to how to.... hmmm... Lets switch to running our tests in c++. Can you build/compile the following opencl c++ code please?

https://gist.github.com/hughperkins/72c9288daf7feb0105a06b1b6b875d57

(There are two files. Create them both in the same directory, then do bash runcoclissue3a.sh, and paste the output)

alephman commented 8 years ago

Add -fPIC compiler parameter for aarch64. The result still looks good:(...

mkdir -p build clang++-3.8 -fPIC -c -o build/cocl-issue3-a.o cocl-issue3-a.cpp -std=c++11 g++ -fPIC -o build/cocl-issue3-a build/cocl-issue3-a.o -lOpenCL build/cocl-issue3-a queued kernel ok clfinish finished ok a[0]=555 a[1]=555 a[2]=123 a[3]=555 a[4]=555

alephman commented 8 years ago

There is a bit weird here: you provided first python code version: http://pastebin.ubuntu.com/23429922/ . the result's number is same every time: 'context', <pyopencl.Context at 0xe4fd90 on <pyopencl.Device 'PowerVR Rogue G6230' on 'PowerVR Rogue' at 0xa7b788>>) run kernel... kernel done ('a_res[:5]', array([ 123., 123., 123., 123., 123.], dtype=float32))

but the second version http://pastebin.ubuntu.com/23429949/ the result number is random, but always float type:

('context', <pyopencl.Context at 0xe4ff90 on <pyopencl.Device 'PowerVR Rogue G6230' on 'PowerVR Rogue' at 0x7e7688>>) run kernel... kernel done ('a_res[:5]', array([ 1.23000000e+02, 6.69739768e-02, 9.66201425e-01, 6.00455105e-01, 5.36456466e-01], dtype=float32))

hughperkins commented 8 years ago

just slightly different version of the code. this version has random, but always same random: https://gist.github.com/hughperkins/6d9d010225f8c9a5e28a7526c7063e78

hughperkins commented 8 years ago

The result still looks good

Cool. Ok, let's switch to EasyCL:

https://gist.github.com/hughperkins/d1d209c08c47449ce6134fce777c982b

alephman commented 8 years ago

Hi hughperkins, The steps I did like this:

  1. download and compile the easycl cd /home/linaro git clone --recursive https://github.com/hughperkins/EasyCL.git cd EasyCL mkdir build cd build cmake .. make install

modify the files: cocl-issue3-a.cpp file: change-->

include " EasyCL/EasyCL.h"

to -->

include "easycl/EasyCL.h"

run.sh file:

!/bin/bash

set -x set -e

COCL_HOME=/home/linaro/cuda-on-cl # or wherever it is

mkdir -p build clang++-3.8 -DUSE_CLEW -fPIC -I/home/linaro/EasyCL/dist/include -c -o build/cocl-issue3-a.o cocl-issue3-a.cpp -std=c++11 g++ -fPIC -pie -Wl,-rpath,/home/linaro/EasyCL/dist/lib -o build/cocl-issue3-a build/cocl-issue3-a.o -lEasyCL -lclew build/cocl-issue3-a

this result is good:

set -e COCL_HOME=/home/linaro/cuda-on-cl mkdir -p build clang++-3.8 -DUSE_CLEW -fPIC -I/home/linaro/EasyCL/dist/include -c -o build/cocl-issue3-a.o cocl-issue3-a.cpp -std=c++11 g++ -fPIC -pie -Wl,-rpath,/home/linaro/EasyCL/dist/lib -o build/cocl-issue3-a build/cocl-issue3-a.o -lEasyCL -lclew build/cocl-issue3-a createForindexedgpu gpu=0 verbose=1 clpresent:1 checking platform id 0x558604bb90 clGetDeviceIDs:0gpu=0 currentGpuIndex=0 num_devices=1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 clfinish finished ok a[0]=555 a[1]=555 a[2]=123 a[3]=555 a[4]=555

hughperkins commented 8 years ago

Alright. Let's start linking with libcocl.so:

https://gist.github.com/hughperkins/61d43161e937b48271572f6cc9db78a3

alephman commented 8 years ago

1) trying to compile the c++ code, but get an error that undefine the ThreadVars class. I just realized that my code is out of date. I git pull the code up to date, found that new code use cmake instead of Makefile. so I try to do like this: patch ir-to-opencl.cpp and patch-hostside.cpp for transform kernel name uppercase to lowercase. mkdir -p build cd build cmake .. make -j4 // this step is really faster than old, cool! sudo make install

cd ../test/cocl cocl -fPIC cuda_sample.cu ./cuda_sample

it seems the patch doesn't work, I still get build kernel failed( the kernel name doesn't be changed) http://pastebin.ubuntu.com/23434196/

2) trying to compile the cpp test file again, get an error like this:

hughperkins commented 8 years ago

I git pull the code up to date, found that new code use cmake instead of Makefile

:-P

fatal error: 'EasyCL.h' file not found #include "EasyCL.h" ^ 1 error generated.

Builds ok for me. Can you remove both your build directories, ie the one under cuda-on-cl, and the one under the test file, and rebuild, and try again? You probably want to sudo make uninstall from cuda-on-cl build directory first, and check manually that /usr/local is all clean. ie make sure the following files/directories dont eixst:

make -j 4 && sudo make install

By the way, the first line of mycocl-issue3-c.cpp looks like:

#include "EasyCL/EasyCL.h"

... ie, inside EasyCL

alephman commented 8 years ago

thanks. I removed all files/folders you listed manully, then try to compile it again,but get an error. I m sure the code is up to date.

/home/linaro/cuda-on-cl-bak1/src/ir-to-opencl.cpp:22:31: fatal error: cocl/local_config.h: No such file or directory

include "cocl/local_config.h"

I search the local_config.h file, it doesn't exist in this project.

hughperkins commented 8 years ago

Oh. I need to fix that. There should be a file like 'local_config.h.templ'. can you manually copy that as local_config.h, and put it somewhere includable? (Probably in /usr/local/include/cocl)

On 6 November 2016 09:11:24 GMT+00:00, alephman notifications@github.com wrote:

thanks. I removed all files/folders you listed manully, then try to compile it again,but get an error. I m sure the code is up to date.

/home/linaro/cuda-on-cl-bak1/src/ir-to-opencl.cpp:22:31: fatal error: cocl/local_config.h: No such file or directory

include "cocl/local_config.h"

I search the local_config.h file, it doesn't exist in this project.

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258668743

Sent from my Android device with K-9 Mail. Please excuse my brevity.

alephman commented 8 years ago

cuda-on-cl compiling is successful, bu I run your test cpp file, still get an error:

hughperkins commented 8 years ago

Cool. For the undefined refernece issue, can you try adding -lclblast to the g++ commandline, just after -lcocl, please?

alephman commented 8 years ago

Thanks! 1) it sounds good.

2) miss copying cocl to /usr/local/bin when sudo make install: cp bin/cocl /usr/local/bin cocl -fPIC cuda_sample.cu

make: /usr/local/bin/../share/cocl/cocl.Makefile: No such file or directory make: *\ No rule to make target '/usr/local/bin/../share/cocl/cocl.Makefile'. Stop.

3) I try to enable run-test, so I run this commands like this: cmake .. -DBUILD_TESTS=ON make -j4

get an error like this: http://pastebin.ubuntu.com/23435852/

hughperkins commented 8 years ago

miss copying cocl to /usr/local/bin when sudo make install:

Good point! Addressed in 2fc6a8a

fatal error: 'sys/cdefs.h' file not found

I had the same error on Ubuntu. I never quite understood it, but fixed it by doing something like:

sudo apt-get install libc6-dev:i386

(see for example:

alephman commented 8 years ago

$ cocl -fPIC cuda_sample.cu /usr/local/share/cocl/cocl.Makefile: No such file or directory?? there only have a cocl.cmake file in /usr/local/share/cocl folder.

hughperkins commented 8 years ago

Fixed it in latest master https://github.com/hughperkins/cuda-on-cl/commit/ac9ce6474bd43a625c0995ce63341b69198d62f7 . I'm building a dockerfile so I can check the build on a clean machine :-) https://github.com/hughperkins/cuda-on-cl/blob/master/docker/Dockerfile

alephman commented 8 years ago

Sounds very cool! How does the docker support GPU?

hughperkins commented 8 years ago

It doesnt. Unless you use eg nvidia-docker. Each vendor/driver would need to supply their own solution. But at lesat it means I can find the various install script bugs, instead of waiting for other people to find htem for me :-)

alephman commented 8 years ago

the test program you provided (https://gist.github.com/hughperkins/61d43161e937b48271572f6cc9db78a3) still good.

but I still get the error when I run cocl -fPIC cuda_sample.cu:

building kernel _z8setvaluepfif cuda_sample: tools/intern/llvmufgen/USCInstVisitors.cpp:2179: virtual void llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed. Stack dump:

  1. Running pass 'UniFlex generator' on module 'BuildGroup_1'.
alephman commented 8 years ago

Can't find anything relatived information by google:(

hughperkins commented 8 years ago

the test program you provided (https://gist.github.com/hughperkins/61d43161e937b48271572f6cc9db78a3) still good.

Cool. Let's keep going :-)

Let's use cocl::Memory to do the device allocation:

https://gist.github.com/hughperkins/08bb351022e0ed3dcff251c5a1151b6e

hughperkins commented 8 years ago

oh wait...