ros2 / rmw_zenoh

RMW for ROS 2 using Zenoh as the middleware
Apache License 2.0
144 stars 29 forks source link

Cannot make service calls with debug build #177

Closed hwoithe closed 1 month ago

hwoithe commented 1 month ago

I am unable to run service calls with a debug build. I am running commit a8680fbcf51bded8c0612237150461dbe892a1b5 of rmw_zenoh. The build is compiled with colcon build --cmake-args -DCMAKE_BUILD_TYPE=Debug.

terminal 1:

$ ros2 run rmw_zenoh_cpp rmw_zenohd

terminal 2:

$ export RMW_IMPLEMENTATION=rmw_zenoh_cpp
$ ros2 run demo_nodes_cpp add_two_ints_server

terminal 3:

$ export RMW_IMPLEMENTATION=rmw_zenoh_cpp
$ ros2 service call /add_two_ints example_interfaces/srv/AddTwoInts '{a: 1, b: 2}'
...
requester: making request: example_interfaces.srv.AddTwoInts_Request(a=1, b=2)

thread '<unnamed>' panicked at library/core/src/panicking.rs:156:5:
unsafe precondition(s) violated: slice::from_raw_parts requires the pointer to be aligned and non-null, and the total size of the slice not to exceed `isize::MAX`
stack backtrace:
   0:     0x7f00625f7022 - std::backtrace_rs::backtrace::libunwind::trace::he4ee80166a02c846
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
  19:     0x7f0062486b9d - core::slice::raw::from_raw_parts::h2d94e6ae769baa20
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/slice/raw.rs:98:9
  20:     0x7f00615b1fb6 - zenohcd::get::<impl core::convert::From<&zenohcd::get::z_value_t> for zenoh::value::Value>::from::h7727944f10801f9b
                               at  <***removed***>/ws_rmw_zenoh/build/zenoh_c_vendor/zenoh_c_vendor-prefix/src/zenoh_c_vendor/src/get.rs:308:48
  21:     0x7f00615cd296 - <T as core::convert::Into<U>>::into::h24970292011be035
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/convert/mod.rs:759:9
  22:     0x7f006165beba - zenoh::query::GetBuilder<Handler>::with_value::h2ca68547106c8b13
                               at  <***removed***>/.cargo/git/checkouts/zenoh-cc237f2570fab813/763a05f/zenoh/src/query.rs:309:27
  23:     0x7f0061618f35 - z_get
                               at  <***removed***>/ws_rmw_zenoh/build/zenoh_c_vendor/zenoh_c_vendor-prefix/src/zenoh_c_vendor/src/get.rs:224:13
  24:     0x7f0062be75a6 - rmw_send_request
                               at  <***removed***>/ws_rmw_zenoh/src/rmw_zenoh/rmw_zenoh_cpp/src/rmw_zenoh.cpp:2282:8
  25:     0x7f0064c51615 - rcl_send_request
  26:     0x7f00647ba40f - <unknown>
...
  91:     0x7f0065754e40 - __libc_start_main_impl
                               at ./csu/../csu/libc-start.c:392:3
  92:     0x55838b5cef25 - _start
  93:                0x0 - <unknown>
thread caused non-unwinding panic. aborting.
Aborted (core dumped)
 <***removed***>

As we can see from the output above, the from_raw_parts call fails: https://github.com/eclipse-zenoh/zenoh-c/blob/390ec14e1b3785cd771bd6931db65062222bb735/src/get.rs#L308

I believe this is because val.encoding.suffix.start is set to null when the encoding is created: https://github.com/ros2/rmw_zenoh/blob/a8680fbcf51bded8c0612237150461dbe892a1b5/rmw_zenoh_cpp/src/rmw_zenoh.cpp#L2279 https://github.com/eclipse-zenoh/zenoh-c/blob/390ec14e1b3785cd771bd6931db65062222bb735/src/commons.rs#L494 https://github.com/eclipse-zenoh/zenoh-c/blob/390ec14e1b3785cd771bd6931db65062222bb735/src/collections.rs#L35

The from_raw_parts documentation indicates "data must be non-null": https://doc.rust-lang.org/beta/core/slice/fn.from_raw_parts.html#safety

It would probably be best if zenoh-c does the right thing after checking if val.encoding.suffix.start is null, possibly by using an empty string if it is. I am not familiar enough with Zenoh to determine if this is a solution or if rmw_zenoh is creating the encoding in an unexpected way.

I am using rustc 1.78.0 (9b00956e5 2024-04-29).

Yadunund commented 1 month ago

@JEnoch would you be able to take a look at this problem?

JEnoch commented 1 month ago

Hi @hwoithe, Thanks for reporting this!

You're right, in Zenoh Rust the Encoding's Suffix is always returned as a &str which cannot be NULL. So in our default cases the suffix is always an empty string:

I guess such empty string, even with a length of 0, is not a NULL pointer and therefore leads to safer behaviour than with suffix set to z_bytes_t::empty(). We'll do a PR in zenoh-c to fix this.

Still, I think that in rmw_zenoh the opts.value.encoding could not changed, as it's already initialised by default to z_encoding_default() corresponding to the Empty encoding. @Yadunund if you agree I'll just remove all the settings of encodings to z_encoding(Z_ENCODING_PREFIX_EMPTY, NULL); (I count 4), as zenoh-c always initialises encoding in options with z_encoding_default().

As a side note, I didn't manage to reproduce the error, neither in Debug nor Release. But it probably depends on the memory alignment.

Yadunund commented 1 month ago

@JEnoch thanks for investigating the issue so quickly. The proposed approach sounds good to me.