Closed LiShuaixin closed 5 years ago
Hi. When you're building from source, I think there's no reason to set _GLIBCXX_USE_CXX11_ABI
to false. Have you tried building with true?
@peci1 thanks for your reply and yes, I tried building without the flag of _GLIBCXX_USE_CXX11_ABI
which is set true as default to use new gcc API in ubuntu16.04.However, the problem is not be solved. But this time, it's problems related to tensorflow_cc.so. I don't know whether it is the problem about the version of TF. I use TF 1.10 , and I will try to rebuilding it with TF1.9 or 1.8.
Hi @peci1, I rebuilt the whole project with TF 1.9 but still not solved the issue. Before the undefined reference issue, there is a warning as below:
/usr/bin/ld: warning: libtensorflow_framework.so, needed by /home/lee/segmap_ws/devel/lib/libtensorflow_cc.so, not found (try using -rpath or -rpath-link)
following by a lot of undefined issues:
/home/lee/my_ws/devel/lib/libtensorflow_cc.so:undefined reference to ‘tensorflow::XXXXXXXXX'
which seems like libtensorflow_cc.so is not built perfectly. I think it may solve this problem if I link libtensorflow_cc.so to libtensorflow_framework.so. However, I'm not quite familiar with this, and don't know how to do that. Looking forward to your reply!
Have you followed the tutorial at https://github.com/tradr-project/tensorflow_ros_cpp#custom-compilation-of-tensorflow-using-bazel and set TF_BAZEL_LIBRARY
to the exact absolute path to the library?
Yes, I compiled tensorflow_ros_cpp package used the command:
catkin build tensorflow_ros_cpp --cmake-args -DFORCE_TF_PIP_SEARCH="OFF" -DFORCE_TF_BAZEL_SEARCH="ON" -DTF_BAZEL_LIBRARY="/home/lee/tensorflow/bazel-bin/tensorflow/libtensorflow_cc.so" -DTF_BAZEL_SRC_DIR="/home/lee/tensorflow" -DTF_PYTHON_VERSION="2.7.12" -DTF_PIP_PATH="$HOME/segmappyenv/lib/python2.7/site-packages/tensorflow" -DTF_PYTHON_LIBRARY="/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0"
Aha... It seems your bazel build uses the --define framework_shared_object=true
option which this library doesn't expect. Do you need that switch?
Here are my bazel-built library dependencies (and they do not contain the framework library):
$ ldd ~/tensorflow/ws/devel/lib/libtensorflow_cc.so
linux-vdso.so.1 (0x00007ffcc0d4a000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f48e4e22000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f48e4a84000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f48e4865000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f48e465d000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f48e42d4000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f48e40bc000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f48e3ccb000)
/lib64/ld-linux-x86-64.so.2 (0x00007f48ecd6a000)
It's so weird! I checked the original libtensorflow_cc.so in tensorflow root and got:
$ ldd ~/tensorflow/bazel-bin/tensorflow/libtensorflow_cc.so
linux-vdso.so.1 => (0x00007ffdde9ed000)
libtensorflow_framework.so => /home/lee/tensorflow/bazel-bin/tensorflow/../_solib_local/_U_S_Stensorflow_Clibtensorflow_Ucc.so___Utensorflow/libtensorflow_framework.so (0x00007f25aa879000)
libcublas.so.9.1 => /home/lee/tensorflow/bazel-bin/tensorflow/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.9.1 (0x00007f25a7155000)
libcusolver.so.9.1 => /home/lee/tensorflow/bazel-bin/tensorflow/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccusolver___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcusolver.so.9.1 (0x00007f25a19e0000)
libcudart.so.9.1 => /home/lee/tensorflow/bazel-bin/tensorflow/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcudart.so.9.1 (0x00007f25a1772000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f25a1555000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f25a1351000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f25a112f000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f25a0e26000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f25a0aa4000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f25a088e000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f25a04c4000)
/lib64/ld-linux-x86-64.so.2 (0x00007f25b7abe000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f259f34f000)
libcudnn.so.7 => /usr/local/cuda/lib64/libcudnn.so.7 (0x00007f258aaec000)
libcufft.so.9.1 => /usr/local/cuda/lib64/libcufft.so.9.1 (0x00007f25835ff000)
libcurand.so.9.1 => /usr/local/cuda/lib64/libcurand.so.9.1 (0x00007f257f67c000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f257f474000)
libnvidia-fatbinaryloader.so.418.67 => /usr/lib/nvidia-418/libnvidia-fatbinaryloader.so.418.67 (0x00007f257f226000)
libtensorflow_cc.so is linking with libtensorflow_framework.so. However, the one in my_ws lost the link as well as cuda support:
$ ldd ~/segmap_ws/devel/lib/libtensorflow_cc.so
linux-vdso.so.1 => (0x00007ffc462ad000)
libtensorflow_framework.so => not found
libcublas.so.9.1 => /usr/local/cuda/lib64/libcublas.so.9.1 (0x00007f012911d000)
libcusolver.so.9.1 => /usr/local/cuda/lib64/libcusolver.so.9.1 (0x00007f01239a8000)
libcudart.so.9.1 => /usr/local/cuda/lib64/libcudart.so.9.1 (0x00007f012373a000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f012351d000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0123319000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f01230f7000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0122dee000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0122a6c000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0122856000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f012248c000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0138b10000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0122284000)
so I think somewhere could be wrong when I built tensorflow_ros_cpp
Yeah, it lost the link because tensorflow_ros_cpp has to copy (or symlink) all the required libraries to the devel/install space. That's because ROS/catkin doesn't support adding additional LD_LIBRARY_PATH
by packages. So you need to get the files on some path that already is on LD_LIBRARY_PATH
, which is e.g. your devel space's lib
folder. However, the bazel version of this library doesn't expect the library built with the shared framework object, so it doesn't copy it.
A quick solution for you is manually making a symlink to the framework library in devel space.
A correct solution would be to make this library aware of this situation, and if it finds out that the framework object exists, it should also copy it. I won't have time for this fix in a few weeks, but if you'd manage to fix it correctly, feel free to send a PR.
okay, I'll have a try!
Did you succeed building the program in the end?
Yes! Following your suggestion, I manually copied the framework.cc from tensorflow workspace into the devel/lib before building up the whole project, and the issue was solved then.
FYI: I hope I fixed the behavior in commit 1df7740c3dc (version 3.1.2). You can give it a try with first manually deleting the libtensorflow_framework.so*
files from devel/lib
and devel/.private/tensorflow_ros_cpp/lib
(if you're using catkin tools).
Thanks for your time knowing about my problems and please give me some help about how to solve it.
I was trying to compile segmap package. The environment is Ubuntu16.04 + cuda9.0 + cudnn7.0 + nccl 2.2. I set up tensorflow 1.8.0 by running whl file, which is by compliing form source.
bazel build --config=opt --config=cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --define framework_shared_object=true\
//tensorflow/tools/pip_package:build_pip_package
//tensorflow:libtensorflow_cc.so \
//tensorflow:libtensorflow_framework.so \
//tensorflow:install_headers
I followed up tutorial to build the tensorflow_ros_cpp first and I got its done, then i got stuck by compiling segmapper.
Here is my code and reported error.
catkin build tensorflow_ros_cpp --cmake-args -DFORCE_TF_PIP_SEARCH="ON" -DFORCE_TF_BAZEL_SEARCH="ON" -DFORCE_TF_CATKIN_SEARCH="ON" -DTF_BAZEL_LIBRARY="/home/haotian/tensorflow/tensorflow-1.8.0/bazel-bin/tensorflow/libtensorflow_cc.so" -DTF_BAZEL_SRC_DIR="/home/haotian/tensorflow/tensorflow-1.8.0" -DTF_PYTHON_VERSION="2.7.15" -DTF_PIP_PATH="$HOME/segmappyenv/lib/python2.7/site-packages/tensorflow" -DTF_PYTHON_LIBRARY="/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0"
catkin build segmapper --cmake-args -DFORCE_TF_PIP_SEARCH="ON" -DFORCE_TF_BAZEL_SEARCH="ON" -DFORCE_TF_CATKIN_SEARCH="ON" -DTF_BAZEL_LIBRARY="/home/haotian/tensorflow/tensorflow-1.8.0/bazel-bin/tensorflow/libtensorflow_cc.so" -DTF_BAZEL_SRC_DIR="/home/haotian/tensorflow/tensorflow-1.8.0" -DTF_PYTHON_VERSION="2.7.15" -DTF_PIP_PATH="$HOME/segmappyenv/lib/python2.7/site-packages/tensorflow" -DTF_PYTHON_LIBRARY="/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0"
reported error
/usr/bin/ld: warning: libiomp5.so, needed by /home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libmklml_intel.so, needed by /home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so, not found (try using -rpath or -rpath-link)
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnLayoutGetMemorySize_F32'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `omp_in_parallel@VERSION'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `GOMP_barrier@VERSION'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnLayoutCompare_F32'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `omp_get_max_threads@VERSION'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnConversionCreate_F32'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `omp_get_num_threads@VERSION'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnLayoutCreateFromPrimitive_F32'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnLayoutDeserialize_F32'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnConversionExecute_F32'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `cblas_sgemm_batch'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `MKL_Domatcopy'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnLayoutCreate_F32'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `MKL_Comatcopy'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnLayoutSerializationBufferSize_F32'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `cblas_cgemm_batch'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `GOMP_parallel@VERSION'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `MKL_Somatcopy'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnLayoutDelete_F32'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnConvolutionCreateBackwardBias_F32'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnDelete_F32'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `cblas_zgemm_batch'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnExecute_F32'
/home/haotian/segmap_ws/devel/lib/libtf_graph_executor.so: undefined reference to `tensorflow::ReadBinaryProto(tensorflow::Env*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, google::protobuf::MessageLite*)'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `cblas_dgemm_batch'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `omp_get_thread_num@VERSION'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `MKL_Zomatcopy'
/home/haotian/segmap_ws/devel/lib/libtensorflow_cc.so: undefined reference to `dnnLayoutSerialize_F32'
/home/haotian/segmap_ws/devel/lib/libtf_graph_executor.so: undefined reference to `tensorflow::internal::CheckOpMessageBuilder::NewString[abi:cxx11]()'
collect2: error: ld returned 1 exit status
make[2]: *** [/home/haotian/segmap_ws/devel/lib/segmapper/segmapper_node] Error 1
make[1]: *** [CMakeFiles/segmapper_node.dir/all] Error 2
make: *** [all] Error 2
cd /home/haotian/segmap_ws/build/segmapper; catkin build --get-env segmapper | catkin env -si /usr/bin/make --jobserver-fds=6,7 -j; cd -
...........................................................................................................................................................................................................
Failed << segmapper:make [ Exited with code 2 ]
Failed <<< segmapper [ 59.1 seconds ]
[build] Summary: 21 of 22 packages succeeded.
[build] Ignored: 9 packages were skipped or are blacklisted.
[build] Warnings: 12 packages succeeded with warnings.
[build] Abandoned: None.
[build] Failed: 1 packages failed.
[build] Runtime: 37 minutes and 34.7 seconds total.
Please give me some guidance to slove this problem. Thank you so much.
@LiShuaixin @peci1 @dreuter
I currently don't have an Ubuntu 16.04 at hand, but you could try installing libomp-dev
. You probably also need to install intel-mkl
via the instructions provided here: https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-apt-repo.html
I am not sure if that is the correct way though, or if you would need to find these libraries in the bazel build.
Could you maybe run the following commands
find ~/tensorflow/tensorflow-1.8.0/ -name libmklml_intel.so
find ~/tensorflow/tensorflow-1.8.0/ -name libiomp5.so
and post the output?
Also one thing I am wondering about is why you set --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
in the bazel build. Was there a specific reason for doing this?
Hi Daniel,
Thanks for your reply. I am sorry for replying you so late. I have too much final on going.
i have compiled segmap package successfully , but there is another problem get me stuck.
After I source environment and launch (source ~/segmap_ws/devel/setup.bash)
roslaunch segmapper kitti_loop_closure.launch
There are some error in my terminal.
the error looks like below. It seems like the bag could not play well.
========
PARAMETERS
NODES / player (rosbag/play) segmapper (segmapper/segmapper_node) visualizer (rviz/rviz)
auto-starting new master process[master]: started with pid [23573] ROS_MASTER_URI=http://localhost:11311
setting /run_id to bbcb6558-3a48-11eb-afd9-10f005c3d552 process[rosout-1]: started with pid [23586] started core service [/rosout] process[visualizer-2]: started with pid [23610] process[player-3]: started with pid [23611] process[segmapper-4]: started with pid [23612]
/home/haotian/.segmap/kitti/2011_09_30_drive_18.bag
Waiting 0.2 seconds after advertising topics...[ INFO] [1607536963.583940332]: rviz version 1.12.17 [ INFO] [1607536963.583977813]: compiled against Qt version 5.5.1 [ INFO] [1607536963.583987686]: compiled against OGRE version 1.9.0 (Ghadamon) [ INFO] [1607536963.693787873]: Stereo is NOT SUPPORTED [ INFO] [1607536963.693852549]: OpenGl version: 3 (GLSL 1.3). done.
Hit space to toggle paused, or 's' to step. [PAUSED ] Bag Time: 1317376479.232045 Duration: 0.000000 / 287.641810 [PAUSED ] Bag Time: 1317376479.232045 Duration: 0.000000 / 287.641810 [PAUSED ] Bag Time: 1317376479.232045 Duration: 0.000000 / 287.641810 [PAUSED ] Bag Time: 1317376479.232045 Duration: 0.000000 / 287.641810 [PAUSED ] Bag Time: 1317376479.232045 Duration: 0.000000 / 287.641810 [ PAUSED ] Bag Time: 1317376479.232045 Duration: 0.000000 / 287.641810
[image: image.png]
Could you please give me help on solving this problem. Thank you so much.
Best, Haotian
Daniel Reuter notifications@github.com 于2020年11月30日周一 上午7:54写道:
I currently don't have an Ubuntu 16.04 at hand, but you could try installing libomp-dev. You probably also need to install intel-mkl via the instructions provided here: https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-apt-repo.html
I am not sure if that is the correct way though, or if you would need to find these libraries in the bazel build.
Also one thing I am wondering about is why you set --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" in the bazel build. Was there a specific reason for doing this?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tradr-project/tensorflow_ros_cpp/issues/7#issuecomment-735768253, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALITE2Z6VTCG4DU3GQKJW4LSSOIXBANCNFSM4IKR5J7A .
Thanks Daniel for help :) And thanks Haotian for reporting back. Do you know which steps helped you resolve the issue?
I guess the problem was that you have to build tensorflow_ros_cpp and packages that depend on it in an environment where all libraries required by tensorflow are available. This cannot be automatically done via catkin. I guess that if you'd run a python interactive shell in the same console, tensorflow would fail for you with similar errors. It is, however, weird that tensorflow actually managed to build without installing libomp-dev and other libraries to a system-wide location (which would make them available automatically).
Regarding the other error in segmap, the image that should probably show the error, does not show up on github. Without the image, I don't see any error there.
Hi all,
Thanks for your help. I have compiled the segmap package successfully. Also I can reproduce the work in paper. Thank you guys.
Best wishes! Haotian
Martin Pecka notifications@github.com 于2020年12月10日周四 上午1:49写道:
Thanks Daniel for help :) And thanks Haotian for reporting back. Do you know which steps helped you resolve the issue?
I guess the problem was that you have to build tensorflow_ros_cpp and packages that depend on it in an environment where all libraries required by tensorflow are available. This cannot be automatically done via catkin. I guess that if you'd run a python interactive shell in the same console, tensorflow would fail for you with similar errors. It is, however, weird that tensorflow actually managed to build without installing libomp-dev and other libraries to a system-wide location (which would make them available automatically).
Regarding the other error in segmap, the image that should probably show the error, does not show up on github. Without the image, I don't see any error there.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tradr-project/tensorflow_ros_cpp/issues/7#issuecomment-742280960, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALITE2Y7SXRCEODUC343G7DSUBVPHANCNFSM4IKR5J7A .
Yes! Following your suggestion, I manually copied the framework.cc from tensorflow workspace into the devel/lib before building up the whole project, and the issue was solved then.
Could please tell me how to produce the file frame work.cc and where is ?On my tensorflow workspace I can't find it.Thanks
Could please tell me how to produce the file frame work.cc and where is ?
That was a typo. It actually referes to file libtensorflow_framework.so
.
Tensorflow (please complete the following information):
Operating System (please complete the following information):
Linux lee-XPS-15-9550 4.4.0-157-generic #185-Ubuntu SMP Tue Jul 23 09:17:01 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
]ROS (please complete the following information):
Describe the bug I can build tensorflow_ros_cpp smoothly and generate some lib file in
/my_ws/devel/lib
includinglib_prwrap_tensorflow.so
. Then I tried to compiling other packages that depending on tensorflow_ros_cpp, but encountered undefined reference problem as follow:It seems like the problem related to C++ ABI difference. I compiled tensorflow using
bazel build --config=opt --config=cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --define framework_shared_object=true tensorflow:libtensorflow_cc.so //tensorflow/tools/pip_package:build_pip_package
with the flag--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
I compiled tensorflow_ros_cpp usingcatkin build tensorflow_ros_cpp --cmake-args -DFORCE_TF_PIP_SEARCH="ON"
The problem is always exist after tested it with many version of TF and CUDA. I've got stuck for many days and still have no idea to solve this problem. Could you please give me a hint? Thanks in advance!!
****UPDATE***
I followed the instruction on #4 and solved most of undefined reference problem except: