tradr-project / tensorflow_ros

Project moved to tensorflow_ros_cpp
https://github.com/tradr-project/tensorflow_ros_cpp
26 stars 10 forks source link

Error when linking #4

Open sarlinpe opened 6 years ago

sarlinpe commented 6 years ago

Hi there,

Thank you for the awesome work! I have successfully built the package with Python 3.6 and TF 1.4. (after some minor changes). When including it in a simple example, the linking fails with the following error:

CMakeFiles/test_tensorflow.dir/src/test_tensorflow.cc.o: In function `tensorflow::core::RefCounted::~RefCounted()':
test_tensorflow.cc:(.text._ZN10tensorflow4core10RefCountedD2Ev[_ZN10tensorflow4core10RefCountedD5Ev]+0xbd): undefined reference to `tensorflow::internal::LogMessageFatal::LogMessageFatal(char const*, int)'
test_tensorflow.cc:(.text._ZN10tensorflow4core10RefCountedD2Ev[_ZN10tensorflow4core10RefCountedD5Ev]+0xde): undefined reference to `tensorflow::internal::LogMessageFatal::~LogMessageFatal()'
CMakeFiles/test_tensorflow.dir/src/test_tensorflow.cc.o: In function `std::string* tensorflow::internal::MakeCheckOpString<long, int>(long const&, int const&, char const*)':
test_tensorflow.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc]+0x24): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::CheckOpMessageBuilder(char const*)'
test_tensorflow.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc]+0x4b): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::ForVar2()'
test_tensorflow.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc]+0x66): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::NewString()'
test_tensorflow.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc]+0x75): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::~CheckOpMessageBuilder()'
test_tensorflow.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc]+0x89): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::~CheckOpMessageBuilder()'
collect2: error: ld returned 1 exit status
CMakeFiles/test_tensorflow.dir/build.make:88: recipe for target '/ws/devel/lib/example/test_tensorflow' failed
make[2]: *** [/ws/devel/lib/example/test_tensorflow] Error 1
CMakeFiles/Makefile2:249: recipe for target 'CMakeFiles/test_tensorflow.dir/all' failed
make[1]: *** [CMakeFiles/test_tensorflow.dir/all] Error 2
Makefile:126: recipe for target 'all' failed
make: *** [all] Error 2

Could this be due to libtensorflow_cc.so missing ? (as mentioned in https://github.com/tensorflow/tensorflow/issues/2412#issuecomment-374147507). If so, does this mean that linking against the pip-installed TF is a dead-end ?

Cheers

peci1 commented 6 years ago

Hi, just as a quick try: can you try it with TF 1.3, or do you need 1.4? I remember there were some problems with 1.4, they're changing the layout of the files all the time...

peci1 commented 6 years ago

And please specify the operating system and ROS version you're trying on.

sarlinpe commented 6 years ago

Thanks for the quick reply. I am using CentOS 7.4 and the latest version of Catkin (without ROS).

Its turns out that the binaries are built in Release mode, so -DCMAKE_BUILD_TYPE=Release is required. I now face some other issues:

CMakeFiles/test_tensorflow.dir/src/test_tensorflow.cc.o: In function `main':
test_tensorflow.cc:(.text.startup+0x3e): undefined reference to `tensorflow::SessionOptions::SessionOptions()'
test_tensorflow.cc:(.text.startup+0x52): undefined reference to `tensorflow::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**)'
test_tensorflow.cc:(.text.startup+0x69): undefined reference to `tensorflow::Status::ToString() const'
CMakeFiles/test_tensorflow.dir/src/test_tensorflow.cc.o: In function `tensorflow::SessionOptions::~SessionOptions()':
test_tensorflow.cc:(.text._ZN10tensorflow14SessionOptionsD2Ev[_ZN10tensorflow14SessionOptionsD5Ev]+0xd): undefined reference to `tensorflow::ConfigProto::~ConfigProto()'
collect2: error: ld returned 1 exit status
CMakeFiles/test_tensorflow.dir/build.make:88: recipe for target '/ws/devel/lib/example/test_tensorflow' failed
CMakeFiles/Makefile2:249: recipe for target 'CMakeFiles/test_tensorflow.dir/all' failed
make[2]: *** [/ws/devel/lib/example/test_tensorflow] Error 1
make[1]: *** [CMakeFiles/test_tensorflow.dir/all] Error 2
Makefile:126: recipe for target 'all' failed
make: *** [all] Error 2

It fails the same way with TF 1.3 and 1.7. https://github.com/tensorflow/tensorflow/issues/14632#issuecomment-345358750 suggests that these symbols are in libtensorflow_cc.so, which is not included in the last pip binaries of the versions I tried. Maybe it previously was ?

peci1 commented 6 years ago

No, I think libtensorflow_cc.so should definitely not be needed, because the only way to get this library is to either download it from 3rd party sources, or to build it yourself. Google doesn't distribute it.

At my computer with TF 1.3, the _pywrap_tensorflow library definitely contains these symbols you report as undefined. Can you check at your system?

$ nm -CD lib_pywrap_tensorflow.so | grep NewSession
0000000000ff4c60 T TF_NewSession
0000000000ff2d00 T TF_NewSessionOptions
00000000029f25c0 T tensorflow::NewSession(tensorflow::SessionOptions const&)
00000000029f26b0 T tensorflow::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**)
0000000000ec7660 W tensorflow::GrpcSessionFactory::NewSession(tensorflow::SessionOptions const&)
0000000002996e40 W tensorflow::DirectSessionFactory::NewSession(tensorflow::SessionOptions const&)

One more thing to check is if you're not building this package with C++11 ABI enabled, because all the pip-installed libraries are built using the old ABI. If you point the above nm command to another library/executable file built in the same project, it uses C++11 ABI iff grepping for __cxx11 yields some results.

moorage commented 6 years ago

I'm running into a (similar?) issue with TF 1.7 that I compiled myself (necessary to do for the NVIDIA drivers I'm running). Do we have to build TF with C++11 ABI enabled, tensorflow_ros package with C++11 ABI enabled, or both?

peci1 commented 6 years ago

@moorage Highly depends on the system where you're running the package. Recently I was compiling ROS indigo on Debian Stretch, and found out that all the system libraries are using the new ABI, so I needed to recompile everything with the new ABI. So I would say it's best to compile everything with the same ABI your system packages use.

moorage commented 6 years ago

Would "recompile everything" include ros as well (kinetic in our case)?

peci1 commented 6 years ago

If you're on Ubuntu 16.04, ROS Kinetic from the official repos is compiled with the new ABI. So it should be sufficient to compile TF with the new ABI. But I'm not sure if the pip-wheel bazel target doesn't automatically select the old ABI, because (AFAIK) all pip libraries are still being compiled with the old ABI.

I'm now working on proper support for custom builds of TF using the libtensorflow_cc.so library. You can try waiting for it.

peci1 commented 6 years ago

Anyway - I hope you've noticed there's a kinetic-devel branch in tensorflow_ros_test. This branch shows how to make a workaround for the case TF and ROS ABI differ.

That would apply for pip-installed TF on Xenial, for example, or custom compiled TF with the old ABI on Xenial. If you compiled TF yourself with the new ABI (which I'm not sure is possible with the pywrap_tensorflow library), you should go with the master branch.

sarlinpe commented 6 years ago

@peci1 I indeed have these symbols in _pywrap_tensorflow, and it seems that I'm using the old ABI.

Turns out that I had accidentally commented out

LIBRARIES ${TENSORFLOW_LIBRARIES} python2.7 # yes, we also need to link against python...

Changing python2.7 to python3 does the trick. Thanks!

sarlinpe commented 6 years ago

I ended up creating a Catkin package to build Tensorflow from source using the official CMake build.

moorage commented 6 years ago

@Skydes nice!! Did you use tensorflow_ros and/or tensorflow_ros_test with that catkin package? If so, which branch(es)?

sarlinpe commented 6 years ago

@moorage No, it's doesn't use the pip wheel but builds from source instead.

peci1 commented 6 years ago

@Skydes @moorage I did a lot of improvements to the code of this package, and I also improved the documentation. Now the package is more like an interface to TF installed in various ways (including your catkin package). You can give it a try if it fixed your issues.

In the documentation, I also tried to describe what exactly are the C++ ABI problems, so it might help you understanding if you're trying the right thing.

sarlinpe commented 6 years ago

Thanks a lot for the improvements! It seems much more usable now. I'm facing the C++ ABI problem, so I don't really need tensorflow_ros (for now at least) as the catkin package provides the interface I need.

FYI: I'm working on scrapping out unnecessary GRPC/Python dependencies of tensorflow_catkin, so its build time should be reduced.

peci1 commented 6 years ago

Great. I had some problems building with tensorflow_catkin, which I solved by numerous hacks (mainly regarding gomp library, grpc and jpeg). Next week I'll try to summarize what I needed to do to actually succeed compiling on a pretty blank Xenial machine.

Anyways, the idea with the new version of tensorflow_ros is to be able to create ROS packages, that leave their users freedom in the way how they "supply" tensorflow. Somebody wants to compile, somebody is good with the pip version, somebody has a bazel build already...