naibaf7 / caffe

Caffe: a fast open framework for deep learning. With OpenCL and CUDA support.
http://caffe.berkeleyvision.org/
Other
85 stars 20 forks source link

Gradient tests fail on Samsung Chromebook 2 #28

Open psyhtest opened 8 years ago

psyhtest commented 8 years ago

@naibaf7

In configuration USE_GREENTEA := 1, I see lots of Caffe test failures on Samsung Chromebook 2 (ARM Cortex-A15 CPU, ARM Mali-T628 GPU) with this fork (latest commit 04503ee).

What they all seem to have in common is the word "Gradient" in their name. For example:

$ ./build/test/test_all.testbin --gtest_filter=*TestBNLLGradient
Setting to use device 0
Note: Google Test filter = *TestBNLLGradient
[==========] Running 4 tests from 4 test cases.
[----------] Global test environment set-up.
[----------] 1 test from NeuronLayerTest/0, where TypeParam = caffe::CPUDevice<float>
[ RUN      ] NeuronLayerTest/0.TestBNLLGradient
[       OK ] NeuronLayerTest/0.TestBNLLGradient (16 ms)
[----------] 1 test from NeuronLayerTest/0 (16 ms total)

[----------] 1 test from NeuronLayerTest/1, where TypeParam = caffe::CPUDevice<double>
[ RUN      ] NeuronLayerTest/1.TestBNLLGradient
[       OK ] NeuronLayerTest/1.TestBNLLGradient (15 ms)
[----------] 1 test from NeuronLayerTest/1 (16 ms total)

[----------] 1 test from NeuronLayerTest/2, where TypeParam = caffe::GPUDevice<float>
[ RUN      ] NeuronLayerTest/2.TestBNLLGradient
[       OK ] NeuronLayerTest/2.TestBNLLGradient (497 ms)
[----------] 1 test from NeuronLayerTest/2 (497 ms total)

[----------] 1 test from NeuronLayerTest/3, where TypeParam = caffe::GPUDevice<double>
[ RUN      ] NeuronLayerTest/3.TestBNLLGradient
./include/caffe/test/test_gradient_check_util.hpp:184: Failure
The difference between computed_gradient and estimated_gradient is 1.5616071734673349, which exceeds threshold_ * scale, where
computed_gradient evaluates to 1.5616071734673349,
estimated_gradient evaluates to 0, and
threshold_ * scale evaluates to 0.015616071734673349.
debug: (top_id, top_data_id, blob_id, feat_id)=0,0,0,0; feat = 1.2703554366494645; objective+ = 3.0355741737564483; objective- = 3.0355741737564483
...
[  FAILED  ] NeuronLayerTest/3.TestBNLLGradient, where TypeParam = caffe::GPUDevice<double> (530 ms)
[----------] 1 test from NeuronLayerTest/3 (531 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 4 test cases ran. (1061 ms total)
[  PASSED  ] 3 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] NeuronLayerTest/3.TestBNLLGradient, where TypeParam = caffe::GPUDevice<double>

The suspicious line is:

computed_gradient evaluates to 1.5616071734673349,
estimated_gradient evaluates to 0,

but sometimes I see the reverse of this situation when it is computed_gradient evaluates to 0, but estimated_gradient evaluates to a non-zero.

This happens both for float and double tests.

Any ideas?

naibaf7 commented 8 years ago

I'd need to know if there is a pattern on the data index: (top_id, top_data_id, blob_id, feat_id)=0,0,0,0; Can you find that out? Or just post some more index + values of the failures.

Not having one of those GPUs myself it is a bit difficult to track this problem. The runtests are fine on Intel, AMD and nVidia chips otherwise.

295988101 commented 8 years ago

@psyhtest I'm going to run caffe on ARM, using opencl or cuda. But I have tried another opencl caffe version and finally I failed for the complex cross compile. Could you tell me, if you successfully use caffe on arm(opencl or cuda)?So nice as you if you can tell me some details. My mail is zhenght5@gmail.com

best regards to you.

jyegerlehner commented 8 years ago

@psyhtest The example you provide shows the test passes on the GPU for floats and the same test fails on the GPU for doubles. Are all the failed tests only double precision and on the GPU? If so, this suggests to me perhaps the MALI GPU does not support double, which is optional under the spec. Search for CL_DEVICE_DOUBLE_FP_CONFIG and clGetDeviceInfo, which is how the GPU indicates if it supports double.

psyhtest commented 8 years ago

@naibaf7 I'll attach a full log showing the Gradient related failures shortly.

@zhenghuitian Yes, I gave up on AMD's port of Caffe too because it used OpenCL 1.2 and C++ templates in kernels. But I did manage to run Caffe with a couple of patches to clBLAS v2.4. I'm also working on support for CLBlast. Our vision is to create an open framework for optimising CNNs on embedded platforms, which is outlined in our IWOCL abstract. All comments and contributions are welcome!

@jyegerlehner I strongly suspect that Mali does support double precision, as I was managing the OpenCL compiler team at ARM when it was implemented :). But perhaps I wasn't doing my job properly, and this omission somehow wasn't detected by conformance testing?.. :)

jyegerlehner commented 8 years ago

@psyhtest Hah hah OK I guess that rules that out. I thought it was a rather parsimonious theory though.

295988101 commented 8 years ago

@psyhtest Thank you for your answer. It do help me a lot. I am preparing to use CLBlast instead of clBlas because my arm has not AMD gpu. I am reading your IWOCL abstract.Thank you again.

psyhtest commented 8 years ago

@naibaf7

Please see the full (compressed) log from running the following command:

LD_LIBRARY_PATH=/data/install/lib-openblas-v0.2.18/lib:$LD_LIBRARY_PATH \
/data/caffe-naibaf7/build/test/test_all.testbin --gtest_filter=*Gradient* \
> /chronos_downloads/caffe-naibaf7.6c0fbdc.Gradient.log 2>&1
...
[==========] 494 tests from 138 test cases ran. (39528323 ms total)
[  PASSED  ] 384 tests.
[  FAILED  ] 110 tests, listed below:
...

Also attached is my Makefile.config.

psyhtest commented 8 years ago

I also observed a similar failure on Odroid-XU3 (similar chip to Chromebook 2 but with the Mali driver v4.0, rather than v6.0):

[ RUN      ] DeconvolutionLayerTest/2.TestGradient
./include/caffe/test/test_gradient_check_util.hpp:184: Failure
The difference between computed_gradient and estimated_gradient is 2, which exceeds threshold_ * scale, where
computed_gradient evaluates to 2,
estimated_gradient evaluates to 0, and
threshold_ * scale evaluates to 0.0020000000949949026.
debug: (top_id, top_data_id, blob_id, feat_id)=0,0,1,0; feat = 0.97146224975585938; objective+ = -1.53898024559021; objective- = -1.53898024559021
./include/caffe/test/test_gradient_check_util.hpp:184: Failure
The difference between computed_gradient and estimated_gradient is 2, which exceeds threshold_ * scale, where
computed_gradient evaluates to 2,
estimated_gradient evaluates to 0, and
threshold_ * scale evaluates to 0.0020000000949949026.
debug: (top_id, top_data_id, blob_id, feat_id)=0,1,1,0; feat = 0.97146224975585938; objective+ = -1.1870282888412476; objective- = -1.1870282888412476

It is, however, much more intermittent. (I could not reproduce it since.)

naibaf7 commented 8 years ago

@psyhtest Yes.. I am currently looking if there are obvious parts of the code/kernels that could be problematic on these devices. After that I would like to do actual tests on the hardware.