jolibrain / deepdetect

Deep Learning API and Server in C++14 support for PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE
https://www.deepdetect.com/
Other
2.52k stars 561 forks source link

Build with gpu fails on ubuntu 16.04 #318

Closed roysG closed 7 years ago

roysG commented 7 years ago

Configuration

Your question / the problem you're facing:

I am trying to build the deepdetect as instructed with GPU. I also setted the path: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64

I am using gpu, this is the info from console::

roy-G41Dx:~/deepdetect/build$ nvidia-smi

Mon May 29 04:44:50 2017
+------------------------------------------------------+
| NVIDIA-SMI 340.102 Driver Version: 340.102 |
|-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 220 Off | 0000:01:00.0 N/A | N/A | | 35% 41C P12 N/A / N/A | 333MiB / 1023MiB | N/A Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Compute processes: GPU Memory | | GPU PID Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+

roy-G41Dx:~/deepdetect/build$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2015 NVIDIA Corporation Built on Tue_Aug_11_14:27:32_CDT_2015

Cuda compilation tools, release 7.5, V7.5.17

Error message (if any) / steps to reproduce the problem:

make

[ 3%] Performing configure step for 'caffe_dd' CXX .build_release/src/caffe/proto/caffe.pb.cc CXX src/caffe/layers/smooth_l1_loss_layer.cpp CXX src/caffe/layers/softmax_infogain_loss_layer.cpp src/caffe/layers/softmax_infogain_loss_layer.cpp: In member function ‘Dtype caffe::SoftmaxWithInfogainLossLayer::get_normalizer(caffe::LossParameter_NormalizationMode, int) [with Dtype = float]’: src/caffe/layers/softmax_infogain_loss_layer.cpp:111:43: warning: ‘normalizer’ may be used uninitialized in this function [-Wmaybe-uninitialized] return std::max(Dtype(1.0), normalizer); ^ src/caffe/layers/softmax_infogain_loss_layer.cpp: In member function ‘Dtype caffe::SoftmaxWithInfogainLossLayer::get_normalizer(caffe::LossParameter_NormalizationMode, int) [with Dtype = double]’: src/caffe/layers/softmax_infogain_loss_layer.cpp:111:43: warning: ‘normalizer’ may be used uninitialized in this function [-Wmaybe-uninitialized] CXX src/caffe/layers/rnn_layer.cpp CXX src/caffe/layers/tile_layer.cpp CXX src/caffe/layers/prelu_layer.cpp CXX src/caffe/layers/slice_layer.cpp CXX src/caffe/layers/silence_layer.cpp CXX src/caffe/layers/threshold_layer.cpp CXX src/caffe/layers/bnll_layer.cpp CXX src/caffe/layers/filter_layer.cpp CXX src/caffe/layers/lstm_unit_layer.cpp cc1plus: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See file:///usr/share/doc/gcc-5/README.Bugs for instructions. Makefile:576: recipe for target '.build_release/src/caffe/layers/lstm_unit_layer.o' failed make[3]: [.build_release/src/caffe/layers/lstm_unit_layer.o] Error 1 make[3]: Waiting for unfinished jobs.... CMakeFiles/caffe_dd.dir/build.make:108: recipe for target 'caffe_dd/src/caffe_dd-stamp/caffe_dd-configure' failed make[2]: [caffe_dd/src/caffe_dd-stamp/caffe_dd-configure] Error 2 CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/caffe_dd.dir/all' failed make[1]: [CMakeFiles/caffe_dd.dir/all] Error 2 Makefile:83: recipe for target 'all' failed make: *** [all] Error 2

Thanks.

beniz commented 7 years ago

Your compiler is crashing! One reason could be very low level of available RAM. How much RAM on your machine ?

roysG commented 7 years ago

I run the complier on ubuntu desktop, should i replace to ubuntu server? Does it matters? The computer is Daul core 2.3ghz with 4gb of ram ddr2, i use on this computer just to make tests before i buy strong one.

beniz commented 7 years ago

It is very likely you need more RAM. Also, you can try removing all -j${N} in deepdetect/CMakeLists.txt and run your cmake .. and make again.

roysG commented 7 years ago

Ok i try it now, How do you explain that on machine on vps with 2 cpu and 4gb ram without GPU is success to compiled?

beniz commented 7 years ago

I can't explain what people do on their machines... Our builds are tested on servers, laptops and specialized platforms, but we cannot accommodate all combination of hardware.

roysG commented 7 years ago

I am trying to remove all the -j${N} as you told me, but i do not what exactly knows what should i remove in lines:

Makefile.config && ${CMAKE_COMMAND} -E env PATH=${CMAKE_BINARY_DIR}/protobuf/bin:$ENV{PATH} make CUDA_ARCH=${CUDA_ARCH} -j${N} INSTALL_COMMAND "" BUILD_IN_SOURCE 1 )

URL https://github.com/beniz/caffe/archive/master.tar.gz CONFIGURE_COMMAND ln -sf Makefile.config.gpu.cudnn Makefile.config && echo "OPENCV_VERSION:=${OPENCV_VERSION}" >> Makefile.config && make CUDA_ARCH=${CUDA_ARCH} -j${N} INSTALL_COMMAND "" BUILD_IN_SOURCE 1 )

URL https://github.com/beniz/caffe/archive/master.tar.gz CONFIGURE_COMMAND ln -sf Makefile.config.gpu Makefile.config && echo "OPENCV_VERSION:=${OPENCV_VERSION}" >> Makefile.config && ${CMAKE_COMMAND} -E env PATH=${CMAKE_BINARY_DIR}/protobuf/bin:$ENV{PATH} make CUDA_ARCH=${CUDA_ARCH} -j${N} INSTALL_COMMAND "" BUILD_IN_SOURCE 1 )

roysG commented 7 years ago

Maybe it also it can help you to understand my problem, this is my memory:

free -m total used free shared buff/cache available Mem: 3007 1132 1190 33 684 1640 Swap: 3062 25 3037

I also installed the cuda without the openGL(when it asked me), is this matter?

roysG commented 7 years ago

I removed the -j${N} as you advised me, it looks better but still build fail. Now it fails on:

CXX/LD -o .build_release/examples/mnist/convert_mnist_data.bin [ 21%] Performing build step for 'caffe_dd' [ 25%] No install step for 'caffe_dd' [ 28%] Completed 'caffe_dd' [ 28%] Built target caffe_dd Scanning dependencies of target ddetect [ 32%] Building CXX object src/CMakeFiles/ddetect.dir/deepdetect.cc.o [ 35%] Building CXX object src/CMakeFiles/ddetect.dir/caffelib.cc.o In file included from /home/roy/deepdetect/src/outputconnectorstrategy.h:112:0, from /home/roy/deepdetect/src/caffelib.cc:24: /home/roy/deepdetect/src/supervisedoutputconnector.h:46:30: internal compiler error: in ggc_set_mark, at ggc-page.c:1541 :_label(label),_loss(loss) {} ^ Please submit a full bug report, with preprocessed source if appropriate. See file:///usr/share/doc/gcc-5/README.Bugs for instructions. src/CMakeFiles/ddetect.dir/build.make:86: recipe for target 'src/CMakeFiles/ddetect.dir/caffelib.cc.o' failed make[2]: [src/CMakeFiles/ddetect.dir/caffelib.cc.o] Error 1 CMakeFiles/Makefile2:122: recipe for target 'src/CMakeFiles/ddetect.dir/all' failed make[1]: [src/CMakeFiles/ddetect.dir/all] Error 2 Makefile:83: recipe for target 'all' failed make: *** [all] Error 2

Can you help me ? @beniz

beniz commented 7 years ago

Look at dmesg output in the command line. If the system reports your compiler has died from not enough memory, you need to upgrade your hardware spec.

roysG commented 7 years ago

Excellent, it looks good now, yes i am 100 percent sure that it happened from low memory, many thanks!! I want to use in predicsion by the GPU. I just need to transfer the parameter that tell to use in gpu, right?

One more important thing, gpu work as async in compare to cpu, right?

I mean as i check the cpu, the next call will start just when the previous call is done, in gpu is working different or it the same but it more fast?

cchadowitz commented 7 years ago

Just a note - it looks like you're using a very old version of DeepDetect based on the commit shown when the server starts (b3c811f). Also, from your output from nvcc --version it looks like you may have CUDA 7.5, not CUDA 8.0, but you're trying to use this as the path to cuda /usr/local/cuda-8.0/lib64.

That may not have anything to do with the current problems you're having, but it probably doesn't help.

roysG commented 7 years ago

About the cuda, yes it looks very strange, i installed 8.0 but in action the folder is 8.0 but when i did nvcc --version it show cuda 7.5

About the commit this is the commit i work with:

DeepDetect [ commit e99ee48f94678214b0a84a063059437056558032 ] INFO - 09:27:40 - Running DeepDetect HTTP server on localhost:8080

Does it still the old one?

Last thing, i want to buy strong computer with strong gpu and high ram and cpu cause the speed of the response calles when i used in the cpu was slow for my needs.

So i think if i will buy this strong computer and use GPU for prediction, the response calls will be much more faster, do i right?

Many thanks.

roysG commented 7 years ago

??