hughperkins / DeepCL

OpenCL library to train deep convolutional neural networks
Mozilla Public License 2.0
865 stars 199 forks source link

deepcl_unittests fails on Ubuntu 16.04 #116

Closed dasha-5555-5 closed 7 years ago

dasha-5555-5 commented 7 years ago

Hi! I have a GeForce 820M card and tried to pass deepcl_unittests. My clinfo log: https://pastebin.com/PLWecjBN deepcl_unittests output:

DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware (If you have multiple ICDs installed and OpenCL works, you can ignore this message) Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found Trying for OpenCL-enabled CPU X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware (If you have multiple ICDs installed and OpenCL works, you can ignore this message) unknown file: Failure C++ exception with description "Error getting OpenCL device ids for platform index 0: OpenCL errorcode: -1" thrown in the test body. [ FAILED ] testdroplayer.simple_exception (1 ms) [----------] 1 test from testdroplayer (1 ms total)

[----------] 1 test from testjpeghelper [ RUN ] testjpeghelper.writeread [ OK ] testjpeghelper.writeread (0 ms) [----------] 1 test from testjpeghelper (0 ms total)

[----------] Global test environment tear-down [==========] 159 tests from 30 test cases ran. (48098 ms total) [ PASSED ] 135 tests. [ FAILED ] 24 tests, listed below: [ FAILED ] testdropoutforward.comparespecific_0_1_dropout2_pz [ FAILED ] testdropoutforward.comparespecific_0_1_dropout3_pz [ FAILED ] testdropoutforward.comparespecific_0_1_dropout3_small [ FAILED ] testdropoutforward.comparespecific_0_1_dropout3_small2 [ FAILED ] testdropoutforward.comparespecific_0_1_dropout3_small2_tanh [ FAILED ] testdropoutbackward.basic [ FAILED ] testdropoutbackward.basic_2plane_batchsize2 [ FAILED ] testdropoutbackward.compare_args [ FAILED ] testsgd.basic [ FAILED ] testCLMathWrapper.assign [ FAILED ] testCLMathWrapper.assignScalar [ FAILED ] testCLMathWrapper.addinplace [ FAILED ] testCLMathWrapper.multiplyinplace [ FAILED ] testCLMathWrapper.addscalar [ FAILED ] testCLMathWrapper.sqrt [ FAILED ] testCLMathWrapper.squared [ FAILED ] testCLMathWrapper.inverse [ FAILED ] testCLMathWrapper.perelementmult [ FAILED ] testreducesegments.basic [ FAILED ] testGpuOp.addinplace [ FAILED ] testGpuOp.addoutofplace [ FAILED ] testGpuOp.inverse [ FAILED ] testGpuOp.addscalarinplace [ FAILED ] testdroplayer.simple_exception

24 FAILED TESTS YOU HAVE 2 DISABLED TESTS

Any help will be appreciating.

hughperkins commented 7 years ago

The error message means you have no OpenCL-enabled GPU installed, or, at least, none with matching drivers, and which is enabled in the icd. Basically, a necessary pre-requisite is that if you do:

sudo apt-get install clinfo
clinfo

... that there should be at least one device showing up, of type GPU. I think that on your computer, at the moment, this is not the case?

dasha-5555-5 commented 7 years ago

That is a clinfo output:

clinfo clinfo: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by clinfo) X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 Number of platforms 1 Platform Name Intel Gen OCL Driver Platform Vendor Intel Platform Version OpenCL 1.2 beignet 1.1.1 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd Platform Extensions function suffix Intel X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0

Platform Name Intel Gen OCL Driver Number of devices 1 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 Device Name Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile Device Vendor Intel Device Vendor ID 0x8086 Device Version OpenCL 1.2 beignet 1.1.1 Driver Version 1.1.1 Device OpenCL C Version OpenCL C 1.2 beignet 1.1.1 Device Type GPU Device Profile FULL_PROFILE Max compute units 20 Max clock frequency 1000MHz Device Partition (core) Max number of sub-devices 1 Supported partition types None, None, None Max work item dimensions 3 Max work item sizes 512x512x512 Max work group size 512 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 Preferred work group size multiple 16 Preferred / native vector sizes
char 16 / 8
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 8 (n/a) float 4 / 4
double 0 / 2 (n/a) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 32, Little-Endian Global memory size 2147483648 (2GiB) Error Correction support No Max memory allocation 1073741824 (1024MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size 8192 Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 8192x8192 pixels Max 3D image size 8192x8192x2048 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 65536 (64KiB) Max constant buffer size 134217728 (128MiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
Out-of-order execution No Profiling Yes Prefer user sync for interop Yes Profiling timer resolution 80ns Execution capabilities
Run OpenCL kernels Yes Run native kernels Yes SPIR versions <printDeviceInfo:138: get SPIR versions size : error -30> printf() buffer size 1048576 (1024KiB) Built-in kernels cl_copy_region_align4;__cl_copy_region_align16;cl_cpy_region_unalign_same_offset;cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;cl_copy_buffer_rect;cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;cl_copy_image_3d_to_2d;cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;cl_copy_image_2d_to_buffer;cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;cl_copy_buffer_to_image_3d;cl_fill_region_unalign;__cl_fill_region_align2;cl_fill_region_align4;cl_fill_region_align8_2;cl_fill_region_align8_4;cl_fill_region_align8_8;cl_fill_region_align8_16;cl_fill_region_align128;__cl_fill_image_1d;cl_fill_image_1d_array;__cl_fill_image_2d;cl_fill_image_2d_array;cl_fill_image_3d; Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 clCreateContext(NULL, ...) [other] Success [Intel] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform

hughperkins commented 7 years ago

Ok. So you have beignet, and a beignet-compatible GPU, which is good :+1:

This bit doesnt look very healthy:

clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform

I think that should be giving some specific platform, rather than No platform.

The messages about dri2 connection failed are not ideal, but I used to have those, on linux, using beignet, and stuff still worked ok, so we can probably ignore those.

I think you have a fairly old version of beignet? Looks like 1.1.1? I think the current version is 1.3.1 or so? What happens if you upgrade your beignet to a more recent version?

hughperkins commented 7 years ago

Oh look, from your first post, this looks like something that needs to be fixed:

beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
hughperkins commented 7 years ago

(ie, looks like the opencl-icd package needs to be tweaked/reinstalled/uninstalled-then-reinstalled perhaps?)

dasha-5555-5 commented 7 years ago

Thanks for your quick reply. I just upgraded beignet to 1.3.1 v, removed beignet-opencl-icd and rebuilded DeepCL but the tests are still fails. My clinfo log https://pastebin.com/iJHcPj2T.

inxi -G Graphics: Card-1: Intel Haswell-ULT Integrated Graphics Controller Card-2: NVIDIA GF117M [GeForce 610M/710M/810M/820M / GT 620M/625M/630M/720M] Display Server: X.Org 1.18.4 driver: nvidia Resolution: 1600x900@60.01hz GLX Renderer: GeForce 820M/PCIe/SSE2 GLX Version: 4.5.0 NVIDIA 375.39

the deepcl_unittests output:

[ FAILED ] testGpuOp.addscalarinplace (0 ms) [----------] 4 tests from testGpuOp (1 ms total)

[----------] 1 test from testdroplayer [ RUN ] testdroplayer.simple_exception open("/dev/dri/card0", O_RDWR) failed: Too many open files open("/dev/dri/card1", O_RDWR) failed: Too many open files Device open failed, aborting... cl_get_gt_device(): error, unknown device: ffffffff Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found Trying for OpenCL-enabled CPU unknown file: Failure C++ exception with description "Error getting OpenCL device ids for platform index 0: OpenCL errorcode: -1" thrown in the test body. [ FAILED ] testdroplayer.simple_exception (0 ms) [----------] 1 test from testdroplayer (0 ms total)

[----------] 1 test from testjpeghelper [ RUN ] testjpeghelper.writeread unknown file: Failure C++ exception with description "can't open ~foo.jpeg" thrown in the test body. [ FAILED ] testjpeghelper.writeread (0 ms) [----------] 1 test from testjpeghelper (0 ms total)

[----------] Global test environment tear-down [==========] 159 tests from 30 test cases ran. (41279 ms total) [ PASSED ] 134 tests. [ FAILED ] 25 tests, listed below: [ FAILED ] testdropoutforward.comparespecific_0_1_dropout2_pz [ FAILED ] testdropoutforward.comparespecific_0_1_dropout3_pz [ FAILED ] testdropoutforward.comparespecific_0_1_dropout3_small [ FAILED ] testdropoutforward.comparespecific_0_1_dropout3_small2 [ FAILED ] testdropoutforward.comparespecific_0_1_dropout3_small2_tanh [ FAILED ] testdropoutbackward.basic [ FAILED ] testdropoutbackward.basic_2plane_batchsize2 [ FAILED ] testdropoutbackward.compare_args [ FAILED ] testsgd.basic [ FAILED ] testCLMathWrapper.assign [ FAILED ] testCLMathWrapper.assignScalar [ FAILED ] testCLMathWrapper.addinplace [ FAILED ] testCLMathWrapper.multiplyinplace [ FAILED ] testCLMathWrapper.addscalar [ FAILED ] testCLMathWrapper.sqrt [ FAILED ] testCLMathWrapper.squared [ FAILED ] testCLMathWrapper.inverse [ FAILED ] testCLMathWrapper.perelementmult [ FAILED ] testreducesegments.basic [ FAILED ] testGpuOp.addinplace [ FAILED ] testGpuOp.addoutofplace [ FAILED ] testGpuOp.inverse [ FAILED ] testGpuOp.addscalarinplace [ FAILED ] testdroplayer.simple_exception [ FAILED ] testjpeghelper.writeread

25 FAILED TESTS YOU HAVE 2 DISABLED TESTS

hughperkins commented 7 years ago

Can you provide the list of files in your /etc/OpenCL/vendors directory? I think thre are three, right?, ie: nvidia, some clover-y one, and beignet? CAn you remove all except beignet, and redo the clinfo please? Also, DeepCL uses EasyCL to get the GPUs. Can you do please:

git clone --recursive https://github.com/hughperkins/EasyCL
cd EasyCL
mkdir build
cd build
ccmake ..
# press 'c'
# change 'CLEW' to 'OFF', press 'c' again
# press 'g'
make -j 8
./gpuinfo

... and provide the output of ./gpuinfo please?

dasha-5555-5 commented 7 years ago

That is a list in /etc/OpenCL/vendors (I removed mesa.isd): ll /etc/OpenCL/vendors total 16 drwxr-xr-x 2 root root 4096 тра 22 03:14 ./ drwxr-xr-x 3 root root 4096 тра 9 20:53 ../ -rw-r--r-- 1 root root 33 тра 22 02:04 intel-beignet.icd -rw-r--r-- 1 root root 44 лис 13 2015 intel-beignet-x86_64-linux-gnu.icd

clinfo: https://pastebin.com/LxKkChKQ

output ./gpuinfo: https://pastebin.com/vYTVTGPk

hughperkins commented 7 years ago

Hmmm, you have two beignet icds. Can you remove one of them please? Maybe remove the 2015 one, and leave the recent one? Then retry all of: clinfo, gpuinfo, deepcl tests, and paste the outputs please. (I'm not sure if I can solve the issue, but it seems like it's some fundamental icd/driver issue, rather than some thing deep in some code somewhere).

hughperkins commented 7 years ago

(Also, possible to see the contents of the remaining icd file please?)

dasha-5555-5 commented 7 years ago

I removed this one. There was an content of /etc/OpenCL/vendors/intel-beignet.icd: /usr/local/lib/beignet//libcl.so
(I changed it to /usr/local/lib/beignet/libcl.so)

clinfo: https://pastebin.com/zK7KB9Nv

gpuinfo: https://pastebin.com/Sj3h0Dg0

deepcl tests: https://pastebin.com/7QYfazUU

hughperkins commented 7 years ago

Hmmm, you know what, it looks like a ton of the deepcl tests are actually passing? and then we get:

open("/dev/dri/card0", O_RDWR) failed: Too many open files
open("/dev/dri/card1", O_RDWR) failed: Too many open files

Looks like the driver doesnt like repeated connections to ti possibly? You can try running just a few tests at a time, like:

./deepcl_unittests tests=testCLMathWrapper.*
dasha-5555-5 commented 7 years ago

Yes, when I run ./deepcl_unittests tests=testCLMathWrapper.* this tests passed. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 a[0]=4 a[1]=6.3 a[2]=45 a[3]=37.5 a[4]=23 [ OK ] testCLMathWrapper.perelementmult (40 ms) [----------] 9 tests from testCLMathWrapper (419 ms total)

[----------] Global test environment tear-down [==========] 9 tests from 1 test case ran. (419 ms total) [ PASSED ] 9 tests.

Thank you a lot!

hughperkins commented 7 years ago

Cool :-)