jolibrain / deepdetect

Deep Learning API and Server in C++14 support for PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE
https://www.deepdetect.com/
Other
2.52k stars 561 forks source link

Future support to Ubuntu 18.04 container on IBM OpenPower + Tesla P100 + CUDA 10.0 #560

Open gustavovaliati opened 5 years ago

gustavovaliati commented 5 years ago

First of all, congratulations for the great work you are doing.

This is a question.

I am aware the supported plataforms are Ubuntu 14.04 and 16.04. However, I am wondering if we have some progress already in supporting 18.04.

I am working on a IBM openpower with TESLA P100, running a pre-defined docker image built on ubuntu 18.04 with CUDA 10.0. I am kind of restricted to that, and I would like to test the deepdetect server on it. I am working to solve some compilation problems (like the one reported below), that I think are related to the different plataform I am using.

Does anyone have trying to work in a similar environment? Thank you.

Checklist

Before creating a new issue, please make sure that:

If Ok, please give as many details as possible to help us solve the problem more efficiently.

Configuration

Your question / the problem you're facing:

Error message (if any) / steps to reproduce the problem:

-- Configuring incomplete, errors occurred! See also "/home/myuser/workspace/deepdetect/build/CMakeFiles/CMakeOutput.log". See also "/home/myuser/workspace/deepdetect/build/CMakeFiles/CMakeError.log".

beniz commented 5 years ago

Hi @gustavovaliati thanks for the kind words.

This is known difficulty, and the instructions how to resolve it are below. However, if you have the ability to actually run a Docker image instead for P100 following DeepDetect P100 docker, that's the prefered (painless) way :)

wget https://github.com/Kitware/CMake/releases/download/v3.14.0/cmake-3.14.0.tar.gz
tar xvzf cmake-3.14.0.tar.gz
cd cmake-3.14.0
./bootstrap
make
sudo make install

Proceed with the DeepDetect build as from Build for P100 GPU from source instructions.

We'll update the online documentation accordingly.

gustavovaliati commented 5 years ago

Thank you for such a complete and quick response.

Great to know I was going to do similar procedures to solve the situation. As soon as I test it, I will report back here. Cya.

gustavovaliati commented 5 years ago

Hi! With some additional steps to your instructions I have been able to overcome the initial problem. The changes are:

Thank you! :+1:

Right now I am working on some problems when compiling the tests with cmake -DBUILD_TESTS=ON .. && make.

I have configured the initial deepdetect build to do not use CUDNN, once I don't have it: cmake .. -DUSE_SIMSEARCH=ON -DUSE_CUDNN=OFF -DCUDA_ARCH="-gencode arch=compute_60,code=sm_60" -DUSE_TF=ON -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF

Additionally it is asking for the bazel even that it is already installed.

||/ Name                                          Version                     Architecture                Description
+++-=============================================-===========================-===========================-===============================================================================================
ii  bazel                                         0.15.0-14232.d68440f11      ppc64el                     Correct, reproducible, fast builds for everyone

Current error:

compile_linux_protobuf.sh finished successfully!!!
tensorflow/contrib/makefile/downloads/nsync/builds/default.linux.c++11/nsync.a
Using CUDA from /usr/local/cuda
CUDA support enabled
sed: can't read /usr/local/cuda/include/cudnn.h: No such file or directory
Cannot find bazel. Please install bazel.
Configuration finished
./build_tensorflow.sh: line 52: bazel: command not found
CMakeFiles/tensorflow_shared_gpu.dir/build.make:107: recipe for target 'tensorflow-stamp/tensorflow_shared_gpu-configure' failed
make[5]: *** [tensorflow-stamp/tensorflow_shared_gpu-configure] Error 127
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/tensorflow_shared_gpu.dir/all' failed
make[4]: *** [CMakeFiles/tensorflow_shared_gpu.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make[3]: *** [all] Error 2
CMakeFiles/tensorflow_cc.dir/build.make:106: recipe for target 'tensorflow_cc/src/tensorflow_cc-stamp/tensorflow_cc-configure' failed
make[2]: *** [tensorflow_cc/src/tensorflow_cc-stamp/tensorflow_cc-configure] Error 2
CMakeFiles/Makefile2:146: recipe for target 'CMakeFiles/tensorflow_cc.dir/all' failed
make[1]: *** [CMakeFiles/tensorflow_cc.dir/all] Error 2
Makefile:94: recipe for target 'all' failed
make: *** [all] Error 2

As soon as I have any solution for that I am going to report it here.

beniz commented 5 years ago

Hi, unless you really need TF for some specific image model, I'd recommend to build without it, at least at first. That being said I'm pretty certain you need bazel 0.8 explicitly otherwise it won't build, see https://github.com/jolibrain/deepdetect/blob/master/docker/gpu-caffe-tf/Dockerfile

You also may want to keep cudnn support on, it's the default and most useful configuration, unless there's some good reason to deactivate it.

panovr commented 5 years ago

For Ubuntu 18.04, I also need to install libssl-dev package.

beniz commented 5 years ago

I remember having seen it missing here and then, might depend on the primary OS install.

panovr commented 5 years ago

By the way, for Ubuntu 18.04, do we still need to use the libcurlpp version from github https://github.com/jpbarrette/curlpp.git like in Ubuntu 16.04?

beniz commented 5 years ago

libcurpp has been fixed in 18.04, let us know if the instructions are not clear: https://www.deepdetect.com/quickstart-server/?opts={%22os%22:%22ubuntu%22,%22source%22:%22build_source%22,%22compute%22:%22gpu%22,%22gpu%22:%22gtx%22,%22backend%22:[%22caffe%22,%22tsne%22,%22xgboost%22],%22deepdetect%22:%22server%22}

YaYaB commented 4 years ago

I've had some difficulties to install DD on Ubuntu 18.04, I had to install libboost-all-dev, libssl-dev before installing cppnet. I've finally made a bash script to automate the installation based on the commit version. https://gist.github.com/YaYaB/7d5b117d4a9976b73201f7fb28eaae95

cchadowitz commented 4 years ago

FYI I have a rough dockerfile that setups up DD on Ubuntu 18.04 (with caffe, TF, dlib backends) here: https://github.com/jolibrain/deepdetect/issues/687#issuecomment-572749679

Just make sure to use TF v0.13.1 and bazel v0.21.0 to avoid the errors I had in #687 :)