naibaf7 / caffe

Caffe: a fast open framework for deep learning. With OpenCL and CUDA support.
http://caffe.berkeleyvision.org/
Other
86 stars 20 forks source link

Error building greentea_math_functions.cpp #17

Closed GAnthony closed 8 years ago

GAnthony commented 8 years ago

Hi: First, thanks for sharing this project!

I'm trying to build on the following:

% cmake -DUSE_CUDA=OFF -DUSE_GREENTEA=ON -DUSE_OPENCV=OFF -DUSE_LEVELDB=OFF -DUSE_LMDB=OFF ../caffe

=> Configuration summary in attached config_summary.txt.

% make all 2&>make.txt => Build errors/warnings in attached make.log

I believe I have all the right dependencies and even updated gcc to 4.9.3 from gcc 4.8.

The main error is: /usr/include/viennacl/matrix.hpp:249:83: error: 'unsigned int' is not a class, struct, or union type typedef typename F::orientation_category orientation_category;

If you get a chance, would appreciate if you could see anything I might be doing wrong?

My final goal is to get this working via OpenCL on an ARM+DSP device, but the Chromebook 2 is a first step.

config_summary.txt make.txt

naibaf7 commented 8 years ago

@GAnthony Sure, I'll look into it. The performance might be bad on ARM chips with GPU though, depending on how well the OpenCL drivers and chips work. I'll look into improving that, but my only ARM test device at the moment is an Android phone, so it'll take a bit longer.

What version of ViennaCL do you use? Try 1.7.1. from here: http://viennacl.sourceforge.net/viennacl-download.html

GAnthony commented 8 years ago

I used the travis_install.sh script, which installed ViennaCL-1.5.1.
I'll try version 1.7.1 and report back.

GAnthony commented 8 years ago

Thanks! So installing ViennaCL-1.7.1 got compilation to pass the first error.

However, looks like OpenCV is a hard dependency, despite setting USE_OPENCV=OFF in cmake.

The next error complains about not finding opencv2 headers:

[ 13%] Building CXX object src/caffe/CMakeFiles/caffe.dir/layers/affinity_layer.cpp.o /home/gpitney/caffe/src/caffe/layers/affinity_layer.cpp:2:39: fatal error: opencv2/highgui/highgui.hpp: No such file or directory

include <opencv2/highgui/highgui.hpp>

Perhaps the opencl branch is behind master? Should I rebase?

naibaf7 commented 8 years ago

@GAnthony The affinity layer is one I added, and I forgot to set the flags there to exclude the layer when OpenCV is not enabled. What you can do for now is just delete the affinity_layer.cpp and affinity_layer.hpp. I'll fix the branch meanwhile.

naibaf7 commented 8 years ago

@GAnthony Should be fixed now.

GAnthony commented 8 years ago

Thanks! Got further, then hit another:

Building CXX object src/caffe/CMakeFiles/caffe.dir/layers/malis_loss_layer.cpp.o /home/gpitney/caffe/src/caffe/layers/malis_loss_layer.cpp:2:39: fatal error: opencv2/highgui/highgui.hpp: No such file or directory

include <opencv2/highgui/highgui.hpp>

I put in a USE_OPENCV guard on that file's contents also (similar to your fixes), continued building, then hit:

[100%] Building CXX object python/CMakeFiles/pycaffe.dir/caffe/_caffe.cpp.o In file included from /usr/include/python2.7/numpy/arrayobject.h:4:0, from /home/gpitney/caffe/python/caffe/_caffe.cpp:11: /home/gpitney/caffe/python/caffe/caffe.cpp: In member function 'PyObject* caffe::NdarrayCallPolicies::postcall(PyObject, PyObject_)': /home/gpitney/caffe/python/caffe/caffe.cpp:187:71: error: invalid conversion from 'long int' to 'npyintp {aka int}' [-fpermissive] PyObject arr_obj = PyArray_SimpleNewFromData(num_axes, dims.data(),

to ensure all the Python dependecies, I successfully executed: % sudo pip install -r ../caffe/python/requirements.txt

So, at this point looks like I'm in pure Caffe build dependency discovery mode, so I think we can close this open issue.

Many thanks you for your help!

GAnthony commented 8 years ago

OK, just to follow up, got past the above compile error simply by changing npy_long to npy_int in _caffe.cpp at the offending line.

After that, was able to run % make runtest which only failed 16 of 1751 tests on the ARM Mali GPU. (I'm guessing that's pretty good for a Caffe-non-supported OpenCL/GPU).

naibaf7 commented 8 years ago

@GAnthony Yes, it's a very good start. Which tests did it fail? Would be great if you could post a log of the failed tests so that I can potentially fix it.

GAnthony commented 8 years ago

To run the LeNet MNIST example, I had to rebuild with USE_LMDB and USE_LEVELDB, and also tweak some mdb calls in convert_mnist_data.cpp and db_lmdb.cpp, following the guidance here: http://planspace.org/20150614-the_nvidia_jetson_tk1_with_caffe_on_mnist/ and other posts.

After that, make runtest failures actually reduced to only 8 (caffe_tests_2.txt)

I currently have a LeNet network training on the Mali-T628 GPU with OpenCL on my Chromebook 2! (Accuracy = 0.9844 and climbing!). Indeed, it is slower than the CPU version, but I have yet to run any benchmarks.

One interesting warning was produced from train_lenet.sh: I0220 00:23:45.624104 9585 net.cpp:351] The NetState phase (1) differed from the phase (0) specified by a rule in layer mnist ...but it seems this isn't preventing the training.

Once I get a little more time, I can step into the individual tests as well, to gain some understanding.

naibaf7 commented 8 years ago

@GAnthony Oh, that's convolution and inner product gradients (backpropagation) failing, usually not a good sign. I suspect however that this is because the Mali OpenCL compiler chose to compile the convolution in a mathematically/numerically unsafe way. In the best case, the gradients just differ a bit more than expected and it should otherwise be fine. If the gradients differ a lot, this would be rather concerning.