Open kgreenek opened 1 year ago
Somewhat related to https://github.com/mvukov/rules_ros2/issues/57. What you get above is expected behavior at the moment. Please let me know what's your use-case where you would want to run e.g. rosbag utils w/o invoking bazel.
We install our software onto a robot and it is run there. There is no bazel on the robot / production system. We do need to run rosbag record (and other ros2 tools) on the robot. So this is really important for us.
The flow looks like this:
On the robot, we need to be able to record rosbags. We do not / will not install bazel anywhere on the robot, nor do we compile code there.
Gotcha. This is not a problem with this repo, but related to packaging of bazel binaries. Long time ago I created a rule in https://github.com/mvukov/rules_ros/blob/main/third_party/packaging.bzl for this purpose (look at the main readme for more info). That solution is heavily based on rules_docker, just FYI. There is probably a simpler solution by now. At the time I created this rule rules_pkg didn't work well. The situation might have changed -- TBH, I would try a recent rules_pkg.
IIUC, you want to deploy on the robot several binaries, including a ros2_bag target. How do you group binaries to put them with Bazel into a single archive? Do you simply group them with a filegroup
? If this is the case, the rule I put together isn't really going to cut it -- check rules_pkg first :)
You could also check https://github.com/lalten/rules_appimage.
Excellent questions. We are still trying out different options.
I have used rules_pkg for deployment at some pretty large scale in the past with success -- although it had lots of quirks. I have also used rules_docker to push docker images and deploy that way.
In this case, I do think it is an issue with this repository because trying to run the ros2_bag target from bazel-bin fails, even before doing any kind of packaging / deployment.
For example, in my example repo, the following does not work:
bazel build //romalcpp/examples/ros2:bag
./bazel-bin/romalcpp/examples/ros2/bag info [path-to-bagfile]
Note that the bag target hasn't been packaged at all. I'm just running the binary that was created by the build command. It gives the error from the description.
So even if it were deployed correctly using rules_pkg or something else, I believe it would still fail to run.
OK, I see. Once you build the target, if you want to run it w/o bazel you actually have to do:
cd bazel-bin/package/binary.runfiles/workspace_name
./package/binary
And, indeed, rules_pkg alone wouldn't solve all this. For this purpose I generate a custom script within binary_pkg_tar
rule, see https://github.com/mvukov/rules_ros/blob/main/third_party/packaging.bzl#L265-L267.
For a statically linked binary with no file-deps, you can just invoke the binary from wherever you want. Bazel built ROS2 binaries have to be invoked from a correct folder. Well, any binary that has e.g. bazel built shared lib deps or data files.
Got it. I can confirm that cd'ing into the runfiles directory worked. Here is what I did:
cd bazel-bin/romalcpp/examples/ros2/bag.runfiles/__main__
./romalcpp/examples/ros2/bag info [path to bagfile]
Unfortunately, this doesn't help me in my deployment use-case. My plan was to symlink the bag command somewhere that is in the $PATH when deployed to the robot. Creating a symlink is supported by the rules_pkg command when the tar file is uncompressed. This way, the bag command can be used from anywhere to record data.
Imagine ssh'ing into the robot and being able to do this:
cd ~/data/
bag record --all
This will work because the binary from my bag
target is symlinked into the $PATH. When you run the command, it will create a bagfile named with the current timestamp in the current directory (e.g. ~/data/2023-04-17-12-12-00).
With the current implementation of the ros2_bag target, here is what our team would have to do on the robot in order to use it:
cd /opt/romalcpp/romalcpp/examples/ros2/bag.runfiles/__main__
./romalcpp/examples/ros2/bag record --all -o ~/data/my_bag_file
In this example scenario, /opt/romalcpp
is the install path where we unpackaged the binaries onto the robot (or in a docker container, but you get the idea).
I would argue that this is pretty un-ergonomic. It requires $PWD to be set to something that is deeply nested and unintuitive. And I also have to pass the -o
flag explicitly every time I run the bag command instead of allowing it to be generated with the current timestamp as in the first example.
Is there any way to support running ros2_bag targets without having a specific requirement on the current working directory?
Maybe there is an environment variable I could set? Or maybe the ros2_bag target could output a generated wrapper bash script that sets up the ros2 stuff to be relative to the path of the bash script itself?
I noticed that this appears to affect other ros2 targets and not just ros2_bag, e.g. ros2_cpp_binary, etc.
I played around a bit with the generated *_launch file for that target and I was able to make it run independently from $PWD by adding the following:
# Get the full path to the directory where this script is located.
dir="$(cd "$(dirname "${BASH_SOURCE[0]:-$0}")" && pwd)"
runfiles_dir="${dir}/target_name.runfiles/__main__"
Then for all the paths in that script, instead of making them relative to the current working directory, I made them relative to ${runfiles_dir}
. For example:
ament_prefix_path="${runfiles_dir}/path/to/package/target_name_launch_ament_setup"
I think that should work for the ros2_cpp_binary target at least. The ros2_bag target does not appear to have a generated wrapper bash script, but maybe something similar could be done for it too.
@kgreenek maybe you'll find https://github.com/lalten/rules_appimage helpful. Its purpose is to package a executable Bazel target into a portable executable.
That is potentially useful, thank you.
I updated the description for more clarity. The feature request here is unrelated to deployment / packaging. What I really need is the ability to run the ros2_bag binary from any directory, rather than requiring $PWD to be in the runfiles directory.
So, let's start with what happens with bazel run //path/to:a_bin
: bazel changes cwd to the corresponding runfiles dir and then invokes the binary. That would be a runfiles dir in bazel cache. Even in this case invoking e.g. bazel run //path/to:my_rosbag_binary -- -o foo
will create a dir somewhere in Bazel cache (foo is not absolute path).
That's for any binary that has some runtime references to files; nothing specific to this repo. This ensures that e.g. rpaths, referenced file/dir paths and symlinks are valid. I think you cannot escape this and need something like
(cd to/runfiles_dir && ./path/to/binary "$@")
... as I said above. You can try a custom Bazel rule e.g. based on my work mentioned above or create a shorthand/alias for this in e.g. bashrc.
It's not something very intuitive at first sight, but once you learn a bit about runfiles layout, it makes sense IMO.
I noticed that this appears to affect other ros2 targets and not just ros2_bag, e.g. ros2_cpp_binary, etc.
I played around a bit with the generated *_launch file for that target and I was able to make it run independently from $PWD by adding the following:
# Get the full path to the directory where this script is located. dir="$(cd "$(dirname "${BASH_SOURCE[0]:-$0}")" && pwd)" runfiles_dir="${dir}/target_name.runfiles/__main__"
Then for all the paths in that script, instead of making them relative to the current working directory, I made them relative to
${runfiles_dir}
. For example:ament_prefix_path="${runfiles_dir}/path/to/package/target_name_launch_ament_setup"
I think that should work for the ros2_cpp_binary target at least. The ros2_bag target does not appear to have a generated wrapper bash script, but maybe something similar could be done for it too.
Something like this will invalidate symlinks that a binary may use, I am afraid. Also, ROS nodes have hard-coded path to rwm impl shared library -- for this reason one should not change cwd of nodes/deployments.
Thank you for taking the time to write out a thorough explanation. I believe I understand how the runfiles layout is structured, and it makes sense what you're saying.
I still really want to find a way to avoid this cwd requirement. From looking at the code, I think all we need at runtime for C++ targets is the following:
I was able to solve the ament prefix path pretty easily by making that path relative to the directory of the launch script as described above.
Unfortunately, I have not been able to solve the problem of loading the rmw shared library. I thought I could solve it by specifying LD_LIBRARY_PATH in the launch script. However, that still did not seem to work. Even with LD_LIBRARY_PATH set, I get the following error:
terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
what(): failed to initialize rcl init options: failed to load shared library 'external/ros2_rmw_cyclonedds/librmw_cyclonedds.so' due to dlopen error: external/ros2_rmw_cyclonedds/librmw_cyclonedds.so: cannot open shared object file: No such file or directory, at external/ros2_rcutils/src/shared_library.c:99, at external/ros2_rmw_implementation/rmw_implementation/src/functions.cpp:88, at external/ros2_rcl/rcl/src/rcl/init_options.c:75
To be honest, I really don't understand why setting LD_LIBRARY_PATH doesn't work. I feel like I'm getting pretty close here to finding a workable solution.
Another avenue that could work is using ${ORIGIN} when loading the shared libraries, so the paths are relative to the binary at runtime. More discussion about that here: https://stackoverflow.com/a/38058270
Also here is a fork where I am playing around with some ideas: https://github.com/kgreenek/rules_ros2/commit/85b75f46567a483059a4176888c7b5ec538cd181
I'd like to give this a gentle bump because it is becoming increasingly painful for me. I am distributing scripts (i.e. python binaries) to an external partner for them to work with ros2 bag files. Think simple parsing scripts that take as input ros2 bags, and can output files like csv files. These scripts can also make API calls and do some other stuff too.
These scripts are a lot more ergonomic to use if the paths do not have to be all absolute paths. It is convenient to drop a file in the current working directory with a default name for example -- similar to what the ros2bag binary does by default with doing ros2 bag record. It's also not intuitive that relative paths can't be used when passing args to the scripts.
Currently, here is the work-around I have found to enable relative paths for my scripts:
My struggle is that I'm having to do this for every script I write that uses paths.
I know bazel has a particular layout structure, but I do not think it is bazel's intention to require the binaries it produces to have to be run from a specific directory. I would really love if there is a way for the ros2 binaries to use paths that are relative to the binary where it is installed, rather than relying on the current working directory.
One other idea is to support setting an environment variable that points to the runfiles directory. Then I would still need a wrapper script, but it would be simply a matter of setting the right path in the environment variable in that script, rather than having to parse the args and replace all the paths with absolute paths.
I'm also very open to any other ideas that would make this easier. Thanks for any help!
I don't have a solid answer for this, yet. Perhaps put a (draft) PR together and then we can iterate on that.
I'm facing a similar issue. My use case is simply being able to run ros2 bag record
from anywhere and have the recording created in a directory relative to the current working directory.
Looks like @kgreenek figured out what the two core issues are that prevent us from simply calling the binary from elsewhere. Also @kgreenek's approach to solving them sounds reasonable to me, namely making AMENT_PREFIX_PATH
absolute and figuring out a way that the rmw shared library is found at runtime.
@mvukov , about making AMENT_PREFIX_PATH
absolute you write:
Something like this will invalidate symlinks that a binary may use, I am afraid.
What exactly is the issue with the symlinks? In a standard ROS setup, I think AMENT_PREFIX_PATH
would almost always contain only absolute paths. And if I'm not mistaken, we set AMENT_PREFIX_PATH
for stock ROS code to use it. So that code will surely be able to deal with absolute paths there?
Also, ROS nodes have hard-coded path to rwm impl shared library -- for this reason one should not change cwd of nodes/deployments.
That issue I see, but I'm sure we'd be able to find an acceptable solution for this. Maybe that solution would also fix #57.
@kgreenek, the reason setting LD_LIBRARY_PATH
didn't help is the behavior of dlopen
that is eventually used:
[...] If filename contains a slash ("/"), then it is interpreted as a (relative or absolute) pathname. Otherwise, the dynamic linker searches for the object as follows (see ld.so(8) for further details): [...] o If, at the time that the program was started, the environment variable LD_LIBRARY_PATH was defined to contain a colon-separated list of directories, then these are searched. [...]
Since we give a path with slashes in it, dlopen()
just looks in that place and doesn't care about LD_LIBRARY_PATH
at all.
I think the best solution to combine Bazel's "static" dynamic linking using RPATH
and what a stock ROS setup expects is linking against the rmw shared library via Bazel and only give the name of the shared library instead of the entire path. Bazel will make sure that the relative path ends up in RPATH
and the loader will find it at runtime. RPATH
entries can contain $ORIGIN
and this way be relative to the binary.
What do you think?
What exactly is the issue with the symlinks?
You're right, setting up a fixed ament prefix path is should work. I was more shooting at symlinks that Bazel generates, e.g. https://github.com/mvukov/rules_ros2/blob/e5ca0b2f86a3e9f1b97ff8d6b2a8e3e03185ad81/ros2/ament.bzl#L154. Without checking links, we may still need to launch e.g. the rosbag app in it's runfiles folder.
What do you think?
I think this dlopen-based approach is fine if it works. :+1: Please open a draft PR and let's iterate on this.
Maybe a bit off-the-topic, but stuff like https://github.com/mvukov/rules_ros2/blob/main/examples/foxglove_bridge/BUILD.bazel#L33-L34 probably won't work if cwd is something different than the runfiles folder.
For my use case described above I implemented a workaround with a wrapper that is just a shell script. It stores CWD in an environment variable, then calls the Bazel target of another wrapper. This wrapper sets up some more environment variables, one of which is LD_LIBRARY_PATH. Then changes to the path saved in the very beginning and calls the actual ros_bag binary from there. A patch to ament setup makes the ament prefix absolute and another patch checks the path of the rmw lib for existence. If it's not found, it strips the path and only kees the filename, which makes LD_LIBRARY_PATH being considered.
This works for now, but is obviously quite hacky. I will try the cleaner approach outlined above, but that will take a while. Going on vacation now first and then not sure how the priorities will be. So no promises...
I am having the same issue, my goal is to build a binary that can run without ROS, this binary will parses a rosbag and convert the data inside into another format, the only dependency I have is @ros2_rosbag2//:rosbag2_cpp I can build my ros2_cpp_binary without problem, but when running it, I got following error
[ERROR] [1703794883.367811564] [rosbag2_storage]: Unable to create class load instance: Environment variable 'AMENT_PREFIX_PATH' is not set or empty
I ran into another potential snag. I created a
bag
target like so:When I run it using bazel, it works as expected. However, when I run it directly from bazel-bin, I get the following error:
Here is the example repository I have been using if that it helpful: https://github.com/romalorg/romalcpp