opendr-eu / opendr

A modular, open and non-proprietary toolkit for core robotic functionalities by harnessing deep learning
Apache License 2.0
614 stars 95 forks source link

Nanodet C api fails on docker #414

Closed ad-daniel closed 1 year ago

ad-daniel commented 1 year ago

During the building of the docker, a failure occurs when compiling the C api:

Processing triggers for libc-bin (2.31-0ubuntu9.9) ...
make[1]: Entering directory '/opendr/src/c_api'
Building C API...
g++ -fPIC -c opendr_utils.cpp -o /opendr/build/opendr_utils.o -I/usr/local/include/onnxruntime/ -I/usr/local/include/rapidjson/ `pkg-config --cflags opencv4` -I/opendr/include
g++ -fPIC -c activity_recognition_x3d.cpp -o /opendr/build/opendr_x3d.o -I/usr/local/include/onnxruntime/ -I/usr/local/include/rapidjson/ `pkg-config --cflags opencv4` -I/opendr/include
g++ -fPIC -c face_recognition.cpp -o /opendr/build/opendr_face_recognition.o -I/usr/local/include/onnxruntime/ -I/usr/local/include/rapidjson/ `pkg-config --cflags opencv4` -I/opendr/include
g++ -fPIC -c lightweight_open_pose.cpp -o /opendr/build/opendr_open_pose.o -I/usr/local/include/onnxruntime/ -I/usr/local/include/rapidjson/ `pkg-config --cflags opencv4` -I/opendr/include
g++ -fPIC -c object_detection_2d_detr.cpp -o /opendr/build/opendr_detr.o -I/usr/local/include/onnxruntime/ -I/usr/local/include/rapidjson/ `pkg-config --cflags opencv4` -I/opendr/include
g++ -fPIC -c object_detection_2d_nanodet_jit.cpp -o /opendr/build/opendr_nanodet_jit.o -I/usr/local/include/onnxruntime/ -I/usr/local/include/rapidjson/ `pkg-config --cflags opencv4` -I/opendr/include -I/usr/local/libtorch/include -I/usr/local/libtorch/include/torch/csrc/api/include
object_detection_2d_nanodet_jit.cpp:18:10: fatal error: torch/script.h: No such file or directory
   18 | #include <torch/script.h>
      |          ^~~~~~~~~~~~~~~~
compilation terminated.
make[1]: Leaving directory '/opendr/src/c_api'
make[1]: *** [Makefile:49: /opendr/lib/libopendr.so] Error 1
make: *** [Makefile:56: libopendr] Error 2
Removing intermediate container bf6c2d319b36

It doesn't prevent the docker image itself from being built since all is green, but I doubt nanodet C inference would work in this case. Surprisingly, it seems to happen just in the docker, when building from source (setting the test tools) label, it doesn't seem to fail, so it appears to be docker specific.

@ManosMpampis, can you perhaps investigate?

ad-daniel commented 1 year ago

I've checked and the failure occurred in the release as well, both in the cpu and cuda images, so it's not cpu-specific

ManosMpampis commented 1 year ago

what docker do you try? Is it the embeded versions or the pc version?

ad-daniel commented 1 year ago

PC version, built from this dockerfile with:

docker build --build-arg branch=master --file Dockerfile .
ManosMpampis commented 1 year ago

I will try to install it myself and check all the logging.

ManosMpampis commented 1 year ago

So I just install the CPU version, In docker we do not have install the zip dependencies from apt so it can not install libtorch.

remove the libtorch directory in /usr/local/libtorch and the /usr/local/lib/libtorchvision.so after that install zip with : sudo apt install zip and run again the install_torch_c_api.sh

If you want to install the cuda version you must install different version of cuda. cuda 11.2 and libtorch are not compatible. When script is running normaly, a warning explain to the user that cuda enabled libtorch didn't installed properly and what the user must do with references to git issues and a table of wheals that is available. I prefere to make the default cuda to 11.6 or even better 11.7 so we can make it easier for newer version of ubuntu as well but this is a different issue.

In embeded devices the installation is different and I have inform Pavlos Tosidis on how to install and run libtorch so we do not have the same issues.

ManosMpampis commented 1 year ago

@ad-daniel Can you try and run the docker installation in docker and docker-cuda files with the new branch: docker_fixes_libtorch ? I have make some changes so it must install libtorch both in cuda and cpu versions correctly. Moreover I change the version of onnx C install after testing that all scripts work correctly, because in gpu supported installation onnx 1.6.0 wanted cuda 10.2 and couldn't compile for me.

ad-daniel commented 1 year ago

I think it's easier if you open a draft pull request and set the "test release" label. By checking the logs of the "build docker" job you can immediately see if there's problems during the compilation process. That way I can also see the changes you mention

ManosMpampis commented 1 year ago

I can do that but I can not be sure about the CUDA installation with this method, am I wrong?

passalis commented 1 year ago

418 has been merged and this should be resolved. Just to be safe, we should recheck the new dockers prior to release, since updates might happen in between.