hughperkins / cltorch

An OpenCL backend for torch.
Other
291 stars 26 forks source link

cltorch test failed: OpenCL library not found error #28

Closed hitalex closed 9 years ago

hitalex commented 9 years ago

I have installed the OpenCL library (version AMD official, v3) and run luarocks install cltorch without errors, but when I do test luajit -l cltorch -e 'cltorch.test()', following errors occur:

luajit: Something went wrong: OpenCL library not found at /tmp/luarocks_cltorch-scm-1-7340/cltorch/cltorch/src/init.cpp:225 stack traceback: [C]: at 0x7f6e794ed930 [C]: in function 'require' /home/kqc/torch/install/share/lua/5.1/cltorch/init.lua:19: in main chunk [C]: at 0x00463eb0 [C]: at 0x00406240

OS: Ubuntu 12.04 LTS AMD card: Advanced Micro Devices, Inc. [AMD/ATI] Whistler [Radeon HD 6630M/6650M/6750M/7670M/7690M] OpenCL version: v3 (newest) AMD-Catalyst driver version: 15.9 (newest)

I have run the clinfo command: Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.0 AMD-APP (1800.11) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

Platform Name: AMD Accelerated Parallel Processing Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: AMD Radeon 6600M and 6700M Series Device Topology: PCI[ B#1, D#0, F#0 ] Max compute units: 6 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 0 Max clock frequency: 485Mhz Address bits: 32 Max memory allocation: 268435456 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Max pipe arguments: 0 Max pipe active reservations: 0 Max pipe packet size: 0 Max global variable size: 0 Max global variable preferred total size: 0 Max read/write image args: 0 Max on device events: 0 Queue on device max size: 0 Max on device queues: 0 Queue on device preferred size: 0 SVM capabilities:
Coarse grain buffer: No Fine grain buffer: No Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities:
Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties:
Out-of-Order: No Profiling : Yes Queue on Device properties:
Out-of-Order: No Profiling : No Platform ID: 0x7f2bb1463430 Name: Turks Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 1.2 Driver version: 1800.11 Profile: FULL_PROFILE Version: OpenCL 1.2 AMD-APP (1800.11) Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event

Device Type: CL_DEVICE_TYPE_CPU Vendor ID: 1002h Board name:
Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 8 Preferred vector width double: 4 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 8 Native vector width double: 4 Max clock frequency: 800Mhz Address bits: 64 Max memory allocation: 2147483648 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 64 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 6165118976 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 2147483648 Max global variable size: 1879048192 Max global variable preferred total size: 1879048192 Max read/write image args: 64 Max on device events: 0 Queue on device max size: 0 Max on device queues: 0 Queue on device preferred size: 0 SVM capabilities:
Coarse grain buffer: No Fine grain buffer: No Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 1 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities:
Execute OpenCL kernels: Yes Execute native function: Yes Queue on Host properties:
Out-of-Order: No Profiling : Yes Queue on Device properties:
Out-of-Order: No Profiling : No Platform ID: 0x7f2bb1463430 Name: Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz Vendor: GenuineIntel Device OpenCL C version: OpenCL C 1.2 Driver version: 1800.11 (sse2,avx) Profile: FULL_PROFILE Version: OpenCL 1.2 AMD-APP (1800.11) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

hitalex commented 9 years ago

Could anybody help me fix this problem?

hughperkins commented 9 years ago

Hmmm, that's a bit of a mystery. In theory, you've got the hard bits working, ie clinfo is working ok... can you confirm that you're running clinfo from the same terminal / command-prompt that you are running torch, and that you're using sudo for neither of them?

hughperkins commented 9 years ago

I think you could try getting EasyCL working first:

git clone --recursive https://github.com/hughperkins/EasyCL.git
cd EasyCL
mkdir build
cd build
ccmake ..
# press 'c', set BUILD_TESTS=ON, Build_type = Debug, 'c' configure, 'g' generate
# (ccmake should exit when you press 'g')
make -j 4 install
./gpuinfo
cp ../test/*.cl .
./easycl_unittests

If that doesnt work, please go back into the ccmake step, and turn off USE_CLEW, and retry all the remaining steps as before?

hughperkins commented 9 years ago

啊.你在北京.离我不远 :-)

Ah, you're in Beijing. Not far from me :-)

hitalex commented 9 years ago

@hughperkins : 你竟然还会说汉语! I will switch to English...

Yes, installing the AMD graphics driver and OpenCL cost me some time.

Bad news: When USE_CLEW is set ON, ./gpuinfo will output opencl library not found. When USE_CLEW is set OFF, errors occur:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: OPENCL_LIBRARIES linked by target "EasyCL" in directory /home/kqc/github/EasyCL linked by target "gpuinfo" in directory /home/kqc/github/EasyCL

Is this something to do with my OpenCl installation? It seems works well without problems, perhaps its installation directory?

hughperkins commented 9 years ago

Ok, so:

sudo apt-get install opencl-headers

... and then, hmmm, well, I was expecting you would need to install something like libopencl-dev, but that doesnt seem to exist, so maybe just the headers. But, you'll need to manually set OPENCL_LIBRARIES to have the full path to libOpenCL.so.

Is this something to do with my OpenCl installation? It seems works well without problems, perhaps its installation directory?

is there something unusual about the instlalation directory? or the LD_LIBRARY_PATH configuration you are using? It's odd that clinfo manages to find it ok though....

hitalex commented 9 years ago

@hughperkins : I have solved my problem by creating libOpenCL.so in /usr/lib/. It turns out, the installing process of OpenCL will only create /usr/lib/libOpenCL.so.1, but cltorch will only find /usr/lib/libOpenCL.so.

This post helps me a lot: http://www.cnblogs.com/zenny-chen/p/3307946.html "在你用-l的时候,如果动态库文件后缀名为.so.1,那么得把文件名后缀.1去掉。"

Thank you so much. You have been so helpful! Good day!

hughperkins commented 9 years ago

Ah, interesting. That's good information Qingchao. I might modify clew to search also for libOpenCL.so.1.

hughperkins commented 9 years ago

Fixed in 67f24cf , in theory.