Check your environment. Failed to load aparapi native library

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1.download the Aparapi_2013_01_23_linux_x86_64.zip
2.unzip file
3.build and run each of the samples

What is the expected output? What do you see instead?
The expected output is for the code to run on the GPU, but instead I am getting 
the error 

"Check your environment. Failed to load aparapi native library aparapi_x86_64 
or possibly failed to locate opencl native library (opencl.dll/opencl.so). 
Ensure that both are in your PATH (windows) or in LD_LIBRARY_PATH 
(linux).Execution mode=JTP"

What version of the product are you using? On what operating system?
I ran it successfully on my desktop running
Linux version 3.8.0-30-generic (buildd@allspice) (gcc version 4.6.3 
(Ubuntu/Linaro 4.6.3-1ubuntu5) ) #44~precise1-Ubuntu SMP Fri Aug 23 18:32:41 
UTC 2013

but when I try and run it on a server with cat /proc/version
Linux version 2.6.32-358.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc 
version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Fri Feb 22 00:31:26 UTC 
2013

and clinfo

Number of platforms:                 2
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 1.2 AMD-APP (1214.3)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 1.1 CUDA 4.2.1
  Platform Name:                 NVIDIA CUDA
  Platform Vendor:               NVIDIA Corporation
  Platform Extensions:               cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll 

  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               1
  Device Type:                   CL_DEVICE_TYPE_CPU
  Device ID:                     4098
  Board name:                    
  Max compute units:                 64
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               1024
  Max work group size:               1024
  Preferred vector width char:           16
  Preferred vector width short:          8
  Preferred vector width int:            4
  Preferred vector width long:           2
  Preferred vector width float:          8
  Preferred vector width double:         4
  Native vector width char:          16
  Native vector width short:             8
  Native vector width int:           4
  Native vector width long:          2
  Native vector width float:             8
  Native vector width double:            4
  Max clock frequency:               1400Mhz
  Address bits:                  64
  Max memory allocation:             33826476032
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                8192
  Max image 2D height:               8192
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           4096
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                135305904128
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Global
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     1
  Error correction support:          0
  Unified memory for Host and Device:        1
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             Yes
  Queue properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x00007f7fff5aefc0
  Name:                      AMD Opteron(tm) Processor 6376                 
  Vendor:                    AuthenticAMD
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1214.3 (sse2,avx,fma4)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1214.3)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt 

  Platform Name:                 NVIDIA CUDA
Number of devices:               1
  Device Type:                   CL_DEVICE_TYPE_GPU
  Device ID:                     4318
  Max compute units:                 14
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               64
  Max work group size:               1024
  Preferred vector width char:           1
  Preferred vector width short:          1
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          1
  Native vector width short:             1
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1147Mhz
  Address bits:                  32
  Max memory allocation:             1610530816
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                32768
  Max image 2D height:               32768
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           4352
  Alignment (bits) of base address:      4096
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               128
  Cache size:                    229376
  Global memory size:                6442123264
  Constant buffer size:              65536
  Max number of constant args:           9
  Local memory type:                 Scratchpad
  Local memory size:                 49152
  Kernel Preferred work group size multiple:     32
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1000
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue properties:              
    Out-of-Order:                Yes
    Profiling :                  Yes
  Platform ID:                   0x000000000075d490
  Name:                      Tesla C2075
  Vendor:                    NVIDIA Corporation
  Device OpenCL C version:           OpenCL C 1.1 
  Driver version:                319.37
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.1 CUDA
  Extensions:                    cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 

Please provide any additional information below.
I have checked the LD_LIBRARY_PATH and it is set to 
:/opt/AMDAPP/lib/x86_64:/opt/AMDAPP/lib/x86:/usr/local/cuda/lib64:/usr/lib/openm
pi/lib:/usr/local/lib

Original issue reported on code.google.com by eelsswim...@gmail.com on 20 Sep 2013 at 3:18

GoogleCodeExporter commented 8 years ago

Thanks for reporting. 

First I *assume* that you are running a 64 bit JVM... (java -version), I know 
you may have checked already, but this is the error you will get if a 32 bit 
JVM tries to load a 64 bit library.

My only concern is that the CUDA device is only OpenCL 1.1.  So we are loading 
a 1.2 icd (the common infrastructure for loading OpenCL platforms) and then 
finding a 1.1 OpenCL device.  

Does the /usr/local/cuda/lib64 dir also contain OpenCL.so? Where in your path 
are you expecting to load the NVidia 1.1 OpenCL runtime from.

Try taking the AMD dirs out of the path (I am loathed to say this as an AMD 
employee ;) ). You should be able to relying soley on the OpenCL runtime from 
NVidia.

Gary

Original comment by frost.g...@gmail.com on 20 Sep 2013 at 4:01

GoogleCodeExporter commented 8 years ago

There is not an OpenCL.so anywhere on either computer. there is a libOpenCL.so 
on my (working computer) in /usr/lib but not anywhere else in the 
LD_LIBRARY_PATH. On the server (not working computer) I changed the 
LD_LIBRARY_PATH to only include one location of libOpenCl.so at /usr/lib/nvidia 
and it still gives me the same error.

Original comment by eelsswim...@gmail.com on 20 Sep 2013 at 6:54

GoogleCodeExporter commented 8 years ago

Sorry I meant libOpenCL.so.  I hope (and assume) that the libOpenCL.so is 
compatible with your device. 

So try adding the path to libOpenCL.so to your java.library.path using 
something like

java 
-Djava.library.path=${PATH_TO_APARAPI_DIR_CONTAINING_SO}:${PATH_TO_LIBOPENCL_SO}
 classpath + class + args

Gary

Original comment by frost.g...@gmail.com on 20 Sep 2013 at 7:07

GoogleCodeExporter commented 8 years ago

F.Y.I., we're using CentOS 6.4 and glibc 2.12.  Following your hints about 
LD_LIBRARY_PATH, ldd showed that it was glibc mismatched version, 
libaparapi_x86_64.so requires glibc 2.14. We upgraded to 2.14 and fixed the 
problem. Thank you for your help.

Original comment by eelsswim...@gmail.com on 20 Sep 2013 at 8:55

GoogleCodeExporter commented 8 years ago

Great.  I am glad we sorted that. 

And thanks for reporting back. 

I never thought of glibc causing this.  I get to learn something new every day 
:) 

I will close this issue.

Original comment by frost.g...@gmail.com on 20 Sep 2013 at 9:40

Changed state: Done

GoogleCodeExporter commented 8 years ago

if your on windows, you have to take the aprai.dll file and copy/paste it into 
your sys32 or syswow64. you know those 2 folders with all dll files

Original comment by henry333...@gmail.com on 8 Jun 2015 at 6:37

tigerneil / aparapi

Check your environment. Failed to load aparapi native library #129