Closed jainanshul closed 8 years ago
@jainanshul That means your OpenCL compiler was unable to compile one (or more) of the OpenCL kernels. Unfortunately, if the compiler does not throw an error by itself, it's currently hard to find out which kernels did not compile.
I'll try to figure out a fix for this.
For your information, result of caffe device_query
I0315 15:08:22.113821 1972355072 common.cpp:371] Total devices: 2
I0315 15:08:22.114439 1972355072 common.cpp:372] CUDA devices: 0
I0315 15:08:22.114446 1972355072 common.cpp:373] OpenCL devices: 2
I0315 15:08:22.114450 1972355072 common.cpp:397] Device id: 0
I0315 15:08:22.114454 1972355072 common.cpp:399] Device backend: OpenCL
I0315 15:08:22.114459 1972355072 common.cpp:401] Backend details: Apple: OpenCL 1.2 (Dec 8 2015 17:02:20)
I0315 15:08:22.114480 1972355072 common.cpp:403] Device vendor: Intel
I0315 15:08:22.114486 1972355072 common.cpp:405] Name: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
I0315 15:08:22.114491 1972355072 common.cpp:407] Total global memory: 17179869184
I0315 15:08:22.114496 1972355072 common.cpp:397] Device id: 1
I0315 15:08:22.114500 1972355072 common.cpp:399] Device backend: OpenCL
I0315 15:08:22.114503 1972355072 common.cpp:401] Backend details: Apple: OpenCL 1.2 (Dec 8 2015 17:02:20)
I0315 15:08:22.114508 1972355072 common.cpp:403] Device vendor: Intel
I0315 15:08:22.114513 1972355072 common.cpp:405] Name: Iris Pro
I0315 15:08:22.114516 1972355072 common.cpp:407] Total global memory: 1610612736
@jainanshul
Does the error happen on both device 0 and 1, or only on the Iris Pro GPU?
You could try the following:
Remove all the lines
ss << "#ifdef DOUBLE_SUPPORT_AVAILABLE" << "\n\n"; // NOLINT
to
ss << tile_double << "\n\n"; // NOLINT ss << "#endif" << "\n\n";
from cl_kernels.cpp, recompile and try to run again. So basically manually disable double support.
Happens on both device 0 and 1. If I omit flag GPU and run on CPU then I don't see an exception. Let me try your suggestion.
There was only one line in cl_kernels.cpp that I replace with
ss << tile_double << "\n\n"; // NOLINT
ss << "#endif" << "\n\n";
Still the same crash.
I'm not sure you did what I meant... I proposed to remove all double kernels, which means to remove lines 107 to 139 from https://github.com/BVLC/caffe/blob/opencl/src/caffe/greentea/cl_kernels.cpp
However I'll come up with a way to identify the failing kernels individually until the end of the week, for proper debugging.
@naibaf7 sorry I misunderstood your intent. Deleting theses lines and recompiling doesn't fix the crash. However I was able to to fix the crash by applying https://gist.github.com/jainanshul/93e932cc9f31e96adf3d. This of course means the timings are all 0s but it shows the root cause of the crash.
@jainanshul Right. Ironically, this was a change proposed by an Intel PR. Funny that it breaks for Iris Pro on Mac. I tried to make some changes in the benchmark code. Please try if this fixes it.
Using caffe/opencl tree with SHA 945f20bd0452893704239b29a8697e7cfc4378bf OS: OSX macbook pro with Intel Iris Pro GPU Flags used to compile caffe opencl: -DUSE_CUDA=OFF (left rest of the flags as default)
STR:
Error: