tradr-project / tensorflow_ros

Project moved to tensorflow_ros_cpp
https://github.com/tradr-project/tensorflow_ros_cpp
26 stars 10 forks source link

WIP: Add a potentially missing dependency. #7

Closed 130s closed 6 years ago

130s commented 6 years ago

FYI @jonlwowski012

I don't know if these changes are valid as I don't use tensorflow (yet), but in general missing dependencies should be added as upstream as possible, and this repo might be a good place to add (as I assume one level upper than this might be tensoflow repo, which I think change won't easily be merged and even if it does, new binary won't become available anytime soon).

In tf_python.cmake swig is required. Missing this results in build failure. Example from our CI trace:

:
Starting  >>> tensorflow_catkin

Errors     << tensorflow_catkin:make /root/catkin_ws/logs/tensorflow_catkin/build.make.000.log
Cloning into 'tensorflow_src'...

Checking out files:  26% (3002/11487)
:
Checking out files: 100% (11487/11487), done.

Note: checking out 'b874783ccdf4cc36cb3546e6b6a998cb8f3470bb'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at b874783... tf.Dimension raises TypeError for tf.DType (#17086)
CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
  Could NOT find SWIG (missing: SWIG_EXECUTABLE SWIG_DIR)
Call Stack (most recent call first):
  /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.5/Modules/FindSWIG.cmake:75 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  tf_python.cmake:432 (find_package)
  CMakeLists.txt:471 (include)

make[2]: *** [tensorflow_src-prefix/src/tensorflow_src-stamp/tensorflow_src-configure] Error 1
make[1]: *** [CMakeFiles/tensorflow_src.dir/all] Error 2
make: *** [all] Error 2
130s commented 6 years ago

Since Travis CI doesn't seem to be enabled for this repo, I'm running it with my fork. You can see at https://travis-ci.org/130s/tensorflow_ros/builds

peci1 commented 6 years ago

Thanks for the contribution!

As written in package.xml of tensorflow_ros, the dependency on tensorflow_catkin is optional. I added it to package.xml to somehow formalize the dependency, but it is only one of three ways how to get tensorflow and I want to let the user choose which one does he like.

So I suggest removing the rule for finding tensorflow_catkin. This means the rosdep install command won't be able to find it, but that's okay. The question is if the CI script is prepared for not taking this failure as fatal. Can you try that?

Besides, fetching and building tensorflow_catkin is really a bad thing, since it'd eat a LOT of buildfarm resources. We just need to test against the pip tensorflow.

Last, do you know how to configure the CI script to actually test this package by running code from tensorflow_ros_test? Because only that is a final confirmation that the infrastructure provided by tensorflow_ros works as intended (and, by design, the tests cannot be moved into tensorflow_ros).

130s commented 6 years ago

As written in package.xml of tensorflow_ros, the dependency on tensorflow_catkin is optional. I added it to package.xml to somehow formalize the dependency, but it is only one of three ways how to get tensorflow and I want to let the user choose which one does he like.

Now I see what you meant. I just didn't get what the optional tags meant at that time.

I guess the way you defined the optional dependency is fine and helpful from a human operator's perspective, but automated setting (e.g. CI) doesn't interpret that.

So I suggest removing the rule for finding tensorflow_catkin. This means the rosdep install command won't be able to find it, but that's okay. The question is if the CI script is prepared for not taking this failure as fatal. Can you try that?

Ok. I added a commit to remove tensorflow_catkin from package.xml, and comment it out in .rosinstall_ci.

Besides, fetching and building tensorflow_catkin is really a bad thing, since it'd eat a LOT of buildfarm resources. We just need to test against the pip tensorflow.

I see.

Regarding that, our team might be more leaning toward tensorflow_catkin option, and because CI apparently hits the issue you mentioned, we've built a Docker image where TF is pre-built and ROS is available via this pkg. Do you think of anywhere good place to push that kind of image? Right now we're hosting it on our private server but we'll probably not mind posting it somewhere common.

Last, do you know how to configure the CI script to actually test this package by running code from tensorflow_ros_test? Because only that is a final confirmation that the infrastructure provided by tensorflow_ros works as intended (and, by design, the tests cannot be moved into tensorflow_ros).

In the latest commit I'm trying it.

peci1 commented 6 years ago

Looks like we're both working on the same thing at the same time :)

I "forked" your PR into branch test-pr-1-fix, and I got the builds working on indigo, kinetic and melodic. I also added a unit test for indigo (old abi), and now I'm about to write one for the new abi. When it's done, I'll also try to add testing for several version of tensorflow from pip.

It seems we both chose the same way to do the changes, so I'll probably merge just the first two commits and then my changes, but I like your edits to readme ;)

With the dependency in package.xml, I'm not sure if commenting tensorflow_catkin out would not break the builds where it is actually used (because I think if you add something to catkin_package(CATKIN_DEPENDS ...) in CMakeLists, it must have at least a in the package.xml). Or have you tried with tensorflow_catkin and the dependency commented out?

As for the docker images - it would be awesome to have a ready-made image with tensorflow_catkin. There's an official ROS/OSRF page on docker hub, but I'm not sure if they'd like it there. I've never published any docker image, so I don't know the limitations, but can't you just put it on your personal page on docker hub?

130s commented 6 years ago

Closing in favor of https://github.com/tradr-project/tensorflow_ros/pull/8 that's based on top of the branch test-pr-1-fix.

Speaking of Docker registry, the one of ROS-Industrial might be interesting to try, as they already have some GPU images they might be interested in hosting TF images as well. I'll see once we're ready.