mavlink / mavros

MAVLink to ROS gateway with proxy for Ground Control Station
Other
902 stars 993 forks source link

ROS2 `sys_status` plugin services callbacks not executing #1588

Open blakermchale opened 3 years ago

blakermchale commented 3 years ago

Issue details

I've built mavros in ROS Foxy and am running it using mavros_node. Services like ~/arming work for the command plugin. However, for the sys_status plugin I cannot call any of the services like set_mode or vehicle_info_get. I am using the command:

ros2 service call /mavros/set_mode mavros_msgs/srv/SetMode "{'custom_mode': 'AUTO.TAKEOFF'}"

This service call ends up hanging and never returning. It just says this message:

requester: making request: mavros_msgs.srv.SetMode_Request(base_mode=0, custom_mode='AUTO.TAKEOFF')

I've added in rclcpp log info messages to the service callbacks and determined they are never being entered since the messages are not logged.

MAVROS version and platform

Mavros: 2.0.3 ROS: Foxy Ubuntu: 20.04

Autopilot type and version

[ ] ArduPilot [X] PX4

Version: master or 9d3e18dbe10700a13f3048f2807dc4b1a62eee62

Node logs

[mavros_node-5] [INFO][main():52]: Starting mavros_node container
[mavros_node-5] [INFO][main():53]: FCU URL: udp://:14540@127.0.0.1:14557
[mavros_node-5] [INFO][main():54]: GCS URL: 
[mavros_node-5] [INFO][main():55]: UAS Prefix: /uas1
[mavros_node-5] [INFO][main():57]: Starting mavros router node
[mavros_node-5] [INFO][Router():183]: Built-in SIMD instructions: SSE, SSE2
[mavros_node-5] [INFO][Router():184]: Built-in MAVLink package version: 2021.5.5
[mavros_node-5] [INFO][Router():185]: Known MAVLink dialects: common ardupilotmega ASLUAV all development icarous matrixpilot paparazzi standard uAvionix ualberta
[mavros_node-5] [INFO][Router():186]: MAVROS Router started
[mavros_node-5] [INFO][add_endpoint():108]: Requested to add endpoint: type: 0, url: udp://:14540@127.0.0.1:14557
[mavros_node-5] [INFO][add_endpoint():138]: Endpoint link[1000] created
[mavros_node-5] [INFO][add_endpoint():142]: link[1000] opened successfully
[mavros_node-5] [INFO][add_endpoint():108]: Requested to add endpoint: type: 2, url: /uas1
[mavros_node-5] [INFO][add_endpoint():138]: Endpoint link[1001] created
[mavros_node-5] [INFO][add_endpoint():142]: link[1001] opened successfully
[mavros_node-5] [INFO][main():75]: Starting mavros uas node
[mavros_node-5] [INFO][operator()():101]: UAS Executor started
[mavros_node-5] [INFO][add_plugin():266]: Plugin actuator_control created
[mavros_node-5] [INFO][add_plugin():321]: Plugin actuator_control added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin actuator_control initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin altitude created
[mavros_node-5] [INFO][add_plugin():321]: Plugin altitude added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin altitude initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin command created
[mavros_node-5] [INFO][add_plugin():321]: Plugin command added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin command initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin ftp created
[mavros_node-5] [INFO][add_plugin():321]: Plugin ftp added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin ftp initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin geofence created
[mavros_node-5] [INFO][add_plugin():321]: Plugin geofence added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin geofence initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin global_position created
[mavros_node-5] [INFO][add_plugin():321]: Plugin global_position added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin global_position initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin home_position created
[mavros_node-5] [INFO][add_plugin():321]: Plugin home_position added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin home_position initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin imu created
[mavros_node-5] [INFO][add_plugin():321]: Plugin imu added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin imu initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin local_position created
[mavros_node-5] [INFO][add_plugin():321]: Plugin local_position added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin local_position initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin manual_control created
[mavros_node-5] [INFO][add_plugin():321]: Plugin manual_control added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin manual_control initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin param created
[mavros_node-5] [INFO][add_plugin():321]: Plugin param added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin param initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin rallypoint created
[mavros_node-5] [INFO][add_plugin():321]: Plugin rallypoint added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin rallypoint initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin rc_io created
[mavros_node-5] [INFO][add_plugin():321]: Plugin rc_io added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin rc_io initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin setpoint_accel created
[mavros_node-5] [INFO][add_plugin():321]: Plugin setpoint_accel added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin setpoint_accel initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin setpoint_attitude created
[mavros_node-5] [INFO][add_plugin():321]: Plugin setpoint_attitude added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin setpoint_attitude initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin setpoint_position created
[mavros_node-5] [INFO][add_plugin():321]: Plugin setpoint_position added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin setpoint_position initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin setpoint_raw created
[mavros_node-5] [INFO][add_plugin():321]: Plugin setpoint_raw added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin setpoint_raw initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin setpoint_trajectory created
[mavros_node-5] [INFO][add_plugin():321]: Plugin setpoint_trajectory added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin setpoint_trajectory initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin setpoint_velocity created
[mavros_node-5] [INFO][add_plugin():321]: Plugin setpoint_velocity added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin setpoint_velocity initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin sys_status created
[mavros_node-5] [INFO][add_plugin():321]: Plugin sys_status added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin sys_status initialized
[mavros_node-5] [INFO][operator()():172]: TM: Timesync mode: MAVLINK
[mavros_node-5] [INFO][add_plugin():266]: Plugin sys_time created
[mavros_node-5] [INFO][add_plugin():321]: Plugin sys_time added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin sys_time initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin waypoint created
[mavros_node-5] [INFO][add_plugin():321]: Plugin waypoint added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin waypoint initialized
[mavros_node-5] [INFO][add_plugin():266]: Plugin wind_estimation created
[mavros_node-5] [INFO][add_plugin():321]: Plugin wind_estimation added to executor
[mavros_node-5] [INFO][add_plugin():325]: Plugin wind_estimation initialized
[mavros_node-5] [INFO][operator()():175]: Built-in SIMD instructions: SSE, SSE2
[mavros_node-5] [INFO][operator()():178]: Built-in MAVLink package version: 2021.5.5
[mavros_node-5] [INFO][operator()():179]: Known MAVLink dialects: common ardupilotmega ASLUAV all development icarous matrixpilot paparazzi standard uAvionix ualberta
[mavros_node-5] [INFO][operator()():180]: MAVROS UAS via /uas1 started. MY ID 1.191, TARGET ID 1.1
[spawn_entity.py-3] [INFO][main():361]: Spawn Entity started
[spawn_entity.py-3] [INFO][run():143]: Loading entity XML from file /tmp/drone_0.sdf
[spawn_entity.py-3] [INFO][_spawn_entity():268]: Waiting for service /spawn_entity, timeout = 120
[spawn_entity.py-3] [INFO][_spawn_entity():271]: Waiting for service /spawn_entity
[spawn_entity.py-3] [INFO][_spawn_entity():280]: Calling service /spawn_entity
[mavros_node-5] [INFO][recv_message():355]: link[1001] detected remote address 1.191
[spawn_entity.py-3] [INFO][_spawn_entity():284]: Spawn status: SpawnEntity: Successfully spawned entity [drone_0]
[INFO] [spawn_entity.py-3]: process has finished cleanly [pid 28272]
[mavros_node-5] [INFO][recv_message():355]: link[1000] detected remote address 1.1
[mavros_node-5] [INFO][handle_highres_imu():421]: IMU: High resolution IMU detected!
[mavros_node-5] [INFO][log_connect_change():419]: CON: Got HEARTBEAT, connected. FCU: PX4 Autopilot
[mavros_node-5] [INFO][connection_cb():143]: WP: detected enable_partial_push: 0
[mavros_node-5] [INFO][handle_highres_imu():421]: IMU: High resolution IMU detected!
[mavros_node-5] [INFO][handle_attitude_quaternion():374]: IMU: Attitude quaternion IMU detected!
[mavros_node-5] [INFO][arming_cb():373]: Got arming service call
[mavros_node-5] [INFO][request_mission_done():536]: WP: mission received
[mavros_node-5] [INFO][request_mission_done():536]: RP: mission received
[mavros_node-5] [INFO][request_mission_done():536]: GF: mission received

Diagnostics

---
header:
  stamp:
    sec: 1625026107
    nanosec: 809428903
  frame_id: ''
status:
- level: "\0"
  name: 'mavros_router: MAVROS Router'
  message: ok
  hardware_id: none
  values:
  - key: Endpoints
    value: '2'
  - key: Messages routed
    value: '107367'
  - key: Messages sent
    value: '107367'
  - key: Messages dropped
    value: '0'
- level: "\0"
  name: 'mavros_router: endpoint 1000: udp://:14540@127.0.0.1:14557'
  message: ok
  hardware_id: none
  values:
  - key: Received packets
    value: '39190'
  - key: Dropped packets
    value: '0'
  - key: Buffer overruns
    value: '0'
  - key: Parse errors
    value: '0'
  - key: Rx sequence number
    value: '22'
  - key: Tx sequence number
    value: '0'
  - key: Rx total bytes
    value: '6301551'
  - key: Tx total bytes
    value: '65966'
  - key: Rx speed
    value: '24036.000000'
  - key: Tx speed
    value: '250.000000'
  - key: Remotes count
    value: '3'
  - key: Remote [0]
    value: '0.0'
  - key: Remote [1]
    value: '1.0'
  - key: Remote [2]
    value: '1.1'
- level: "\0"
  name: 'mavros_router: endpoint 1001: /uas1'
  message: ok
  hardware_id: none
  values:
  - key: Remotes count
    value: '3'
  - key: Remote [0]
    value: '0.0'
  - key: Remote [1]
    value: '1.0'
  - key: Remote [2]
    value: '1.191'
---
header:
  stamp:
    sec: 1625026107
    nanosec: 821809619
  frame_id: ''
status:
- level: "\0"
  name: 'mavros: MAVROS UAS'
  message: connected
  hardware_id: uas:///uas1
  values: []
- level: "\0"
  name: 'mavros: GPS'
  message: 3D fix
  hardware_id: uas:///uas1
  values:
  - key: Satellites visible
    value: '10'
  - key: Fix type
    value: '3'
  - key: EPH (m)
    value: '0.00'
  - key: EPV (m)
    value: '0.00'
- level: "\x02"
  name: 'mavros: System'
  message: Sensor health
  hardware_id: uas:///uas1
  values:
  - key: Sensor present
    value: '0x1028002C'
  - key: Sensor enabled
    value: '0x1021000C'
  - key: Sensor health
    value: '0x1028002F'
  - key: 3D magnetometer
    value: Ok
  - key: absolute pressure
    value: Ok
  - key: rc receiver
    value: Fail
  - key: AHRS subsystem health
    value: Ok
  - key: pre-arm check status. Always healthy when armed
    value: Ok
  - key: CPU Load (%)
    value: '32.0'
  - key: Drop rate (%)
    value: '0.0'
  - key: Errors comm
    value: '0'
  - key: 'Errors count #1'
    value: '0'
  - key: 'Errors count #2'
    value: '0'
  - key: 'Errors count #3'
    value: '0'
  - key: 'Errors count #4'
    value: '0'
- level: "\0"
  name: 'mavros: Battery'
  message: Normal
  hardware_id: uas:///uas1
  values:
  - key: Voltage
    value: '15.47'
  - key: Current
    value: '-1.0'
  - key: Remaining
    value: '67.0'
- level: "\0"
  name: 'mavros: Heartbeat'
  message: Normal
  hardware_id: uas:///uas1
  values:
  - key: Heartbeats since startup
    value: '263'
  - key: Frequency (Hz)
    value: '0.999996'
  - key: Vehicle type
    value: Quadrotor
  - key: Autopilot type
    value: PX4 Autopilot
  - key: Mode
    value: AUTO.LOITER
  - key: System status
    value: STANDBY

Check ID

Router topic: /uas1/mavlink_source, target: 1.1
ERROR. I got 0 addresses, but not your target 1.1
---
Received 0, from 0 addresses
address   list of messages
vooon commented 3 years ago

I seen that service call problems, but don't really know what's wrong. It could be very random, any plugin can become silent. I also seen that sometimes topics also can stop working. I suspect that problem related with networking, but i blamed docker...

Note that in mavros v2 each plugin has a separate node instance, so on graph you'll see lots of nodes like <uas>.<plugin>. That fact likely resulting in that only some plugins become silent.


Diagnostics exists, just command should be like ros2 topic echo /diagnostics. Check ID - ros2 run mavros mav checkid.

blakermchale commented 3 years ago

I added diagnostics and checkid to my initial post. I also noticed that checkid does not support namespacing outside of the default.

The topics for sys all work fine for publishing at least. I tried sending a message to /mavros/statustext/send and found the callback was never called, similar to the services.

vooon commented 3 years ago

Try to use that option: https://github.com/mavlink/mavros/blob/f1dad6c21b767ef2d3e1bff57952defc318c49bf/mavros/mavros/cmd/__init__.py#L76-L80

Note that it should be passed to mav (that's how click works...). E.g. ros2 run mavros mav --mavros-ns /some/uas2 checkid.

Since services build on top of DDS topics, i assume that we would have problems with both subscribers and service servers. In simplification client sends request to request topic, which service server is subscribed to, and waits a response on response topic, which newer came.

I saw problems even with ros2 topic list ain't showing anything, but still suppose that there may be problems with udp multicast in my network / docker network. I don't do much on Galactic, where they switched default to another DDS impl. Maybe it's worth to try?

blakermchale commented 3 years ago

Thanks for that info. I actually found that passing a namespace to mavros_node does not append a a namespace to the uas topics. For example, these are the topics I see for uas even though all other mavros topics are /drone_0/mavros/...:

/uas1/mavlink_sink
/uas1/mavlink_source

Even with using the --mavros-ns argument it still doesn't work with namespaces.

I've been working in Foxy mostly but tried Galactic today and it did not seem to change the results. I still couldn't get the service callback to work.

vooon commented 3 years ago

Ah, yes, that's because: https://github.com/mavlink/mavros/blob/f1dad6c21b767ef2d3e1bff57952defc318c49bf/mavros/src/mavros_node.cpp#L50

But with multi-drone scenario you don't have to use mavros_node. You need to start one Router (read ros2 composite nodes tutorial), and instruct it to connect to drones and uas.

mavros_router:
  ros__parameters:
    fcu_urls:
      - udp://:14550@
      - udp://:14551@
    uas_urls:
      - /uas1
      - /uas2

Then you can run 2 (or more) UAS with different names (namespaces). Each of them need to have it's own uas link.

Please note that you shuld use different system ids on each system, otherwise you may get wide list of problems.

orchydea commented 3 years ago

Hi! @vooon @blakermchale . I'm facing the same issue on ROS2 Galactic. Have you made any progress on this?

blakermchale commented 3 years ago

@orchydea I haven't found any solution yet and have primarily used RTPS instead as a replacement to mavros

orchydea commented 3 years ago

That's a shame @blakermchale. I will try using RTPS or ros1_brige. I ran mavros on noetic and this issue does not seem to happen.

Thanks!

IanBurwell commented 3 years ago

+1 I'm having this issue on ROS2 Foxy as well. Might have to figure out the ros1_bridge...

orchydea commented 3 years ago

@IanBurwell I ended up porting everything to ROS1 (Noetic). I would recommend to do so if your project is at an early stage. It's a pity that this does not work on ROS2 :cry:

IanBurwell commented 3 years ago

I've possibly found the workaround of just sending raw MavLink commands, such as the following to force RTL. This is for sure a workaround but I am committed to using ROS2. Worst comes to worst I set up ros1_bridge.

ros2 service call /mavros/cmd/command mavros_msgs/srv/CommandLong "{broadcast: false, command: 176, confirmation: 0, param1: 157, param2: 4, param3: 5, param4: 0.0, param5: 0.0, param6: 0.0, param7: 0.0}"
esharet commented 3 years ago

I'm facing the same problem, Ros2 Foxy. when publishing to the topic /mavros/rc/override, the topic of /mavros/rc/override updated but /mavros/rc/in or out doesn't. The plugin rc_io created, added to executor and initialized. Do you have an idea how to fix that?

vooon commented 3 years ago

Nope. But i'm thinking to try use separate executor for each plugin.

Vicidel commented 2 years ago

I have an issue which I think is related to this one When doing ros2 param list after MAVROS2 booted completely/correctly I have an error AttributeError: 'NoneType' object has no attribute 'result'

If it's unrelated I can open a separate issue.

More details of the command output:

# ros2 param list
/mavros:
  component_id
  diagnostic_updater.period
  fcu_protocol
  plugin_allowlist
  plugin_denylist
  qos_overrides./parameter_events.publisher.depth
  qos_overrides./parameter_events.publisher.durability
  qos_overrides./parameter_events.publisher.history
  qos_overrides./parameter_events.publisher.reliability
  qos_overrides./tf.publisher.depth
  qos_overrides./tf.publisher.durability
  qos_overrides./tf.publisher.history
  qos_overrides./tf.publisher.reliability
  qos_overrides./tf_static.publisher.depth
  qos_overrides./tf_static.publisher.history
  qos_overrides./tf_static.publisher.reliability
  system_id
  target_component_id
  target_system_id
  uas_url
  use_sim_time
/mavros_node:
  fcu_url
  gcs_url
  qos_overrides./parameter_events.publisher.depth
  qos_overrides./parameter_events.publisher.durability
  qos_overrides./parameter_events.publisher.history
  qos_overrides./parameter_events.publisher.reliability
  tgt_component
  tgt_system
  use_sim_time
/mavros_router:
  diagnostic_updater.period
  fcu_urls
  gcs_urls
  qos_overrides./parameter_events.publisher.depth
  qos_overrides./parameter_events.publisher.durability
  qos_overrides./parameter_events.publisher.history
  qos_overrides./parameter_events.publisher.reliability
  uas_urls
  use_sim_time
Exception while calling service of node '/mavros/sys': None
Traceback (most recent call last):
  File "/opt/ros/galactic/bin/ros2", line 11, in <module>
    load_entry_point('ros2cli==0.13.1', 'console_scripts', 'ros2')()
  File "/opt/ros/galactic/lib/python3.8/site-packages/ros2cli/cli.py", line 67, in main
    rc = extension.main(parser=parser, args=args)
  File "/opt/ros/galactic/lib/python3.8/site-packages/ros2param/command/param.py", line 39, in main
    return extension.main(args=args)
  File "/opt/ros/galactic/lib/python3.8/site-packages/ros2param/verb/list.py", line 116, in main
    sorted_names = sorted(response.result.names)
AttributeError: 'NoneType' object has no attribute 'result'
Zaaler commented 2 years ago

@vooon

Nope. But i'm thinking to try use separate executor for each plugin.

Did you have any success doing this? Could this be achieved by converting the sys status plugin into its own node and adding it to its own executor?

vooon commented 2 years ago

@Zaaler sorry, i hadn't had time to try that yet.

No, it does not have any sense to strip out any plugin. By design they must be loaded to UAS node, which provides some shared state and other abstractions.

conroy-cheers commented 2 years ago

I'm having the same problem. mavros running on Foxy in a Docker container, latest build from source.

FROM ros:foxy-ros-core-focal

RUN apt-get update && apt-get install -qy --no-install-recommends \
    python3-colcon-common-extensions \
    build-essential gcc git \
    iputils-ping \
    geographiclib-tools \
    python3-vcstool python3-rosinstall-generator python3-osrf-pycommon python3-rosdep

SHELL ["/bin/bash", "-c"]

RUN mkdir -p /ros2_ws/src
WORKDIR /ros2_ws

RUN rosdep init && rosdep update

# install mavros from source
RUN git config --global advice.detachedHead false
RUN rosinstall_generator --format repos mavlink | tee /tmp/mavlink.repos && \
    rosinstall_generator --format repos --upstream-development mavros | tee -a /tmp/mavros.repos && \
    vcs import src < /tmp/mavlink.repos && \
    vcs import src < /tmp/mavros.repos && \
    rosdep install --from-paths src --ignore-src -y && \
    ./src/mavros/mavros/scripts/install_geographiclib_datasets.sh
RUN /bin/bash -c "source /opt/ros/foxy/setup.bash && colcon build"

# ADD ./src /ros2_ws/src
RUN /bin/bash -c "source /opt/ros/foxy/setup.bash && colcon build"

ADD ./params.yaml .

ENV ROS_DOMAIN_ID=42
CMD /bin/bash -c "source install/setup.bash && ros2 run mavros mavros_node --ros-args -r __ns:=/copter --params-file params.yaml"

params.yaml contents:

/**:
  ros__parameters:
    fcu_url: udp://copter:14551@iris:14555

Simulated vehicle is running ArduCopter-4.1 SITL in a separate iris container.

Subscribing to global_position and imu works fine. I can arm the vehicle by calling cmd/arming. But set_mode never returns.

Are there any configurations for which this does work? Or is this just broken for now?

Zaaler commented 2 years ago

I've been debugging. If you comment out any callback that makes a client to the cmd/command service name, the system works fine. Those two callbacks are the autopilot_version_cb and the set_message_interval_cb. I plan on continuing to debug it hopefully today. I would assume any plugin that creates a client with cmd/command will cause that plugin to start to hang. From a grep, I found that includes the home_position, sys_status, and mount_control plugins in cpp.

Maybe @vooon has some insight from this information. It does sound like those plugins are failing to spin which is causing the client to never receive the response. From placing a print in the cmd/command service, I know that service is processing the request and attempting to send back a response. Just seems like the client end isn't continuing to try and receive it.

Any thoughts? Also, where does this issue's importance currently rank in the ROS2 development of MAVROS?

Thanks

vooon commented 2 years ago

Maybe i messed up with spinning futures? Unsure. Also in most recent versions of ROS2 it's possible to write async service server, which may be helpful for services like message_interval, which clearly translates to command_long.

That bug is a show stopper. So yes, it's important :)

vooon commented 2 years ago

@Zaaler i'm not sure what i've changed, but now i can receive topics (status, diag, imu), call some of services (e.g. params pull). Sometimes i have troubles with /mavros/set_mode using ros2 service call, but now it works fine with mav sys mode.

Zaaler commented 2 years ago

@vooon I see that the mavros node can be brought up and some of the services can be used now. However, if I bring up the node, and call the set_message_interval service, it hangs still. Once the hanging occurs, all other services can no longer be called like set_mode. Situation seems to persist.

Are the times that you have trouble using /mavros/set_mode after you tried to call a service that doesn't return?

vooon commented 2 years ago

Yes, i tried to use mav sys after hanging (and killing) ros2 service. But i'm unsure whether it be related to amount of runner threads or not. In my case executor uses 8 threads. I've added thread count report in startup.

I'm still unsure how properly use futures because rclcpp2::spin_future...() throws error saying that the node is already spinning (by UAS::exec, yes).

Michel1968 commented 2 years ago

I found this issue after trying to use the set_mode service and seeing no service responds.

After reading @Zaaler comments and @voon responses, I checked the sys_status.cpp plugin code (ros2 version 2.0.3) and found that in set_message_interval_cb() and in autopilot_version_cb() an asynchronous service request is done to "cmd/command" service and then the code waits for the future response, I talking about code like this: (lines 1083- 1085 on sys_status.cpp) auto future = client->async_send_request(cmdrq); auto response = future.get(); ret = response->success;

I think there is problem with the future.get() : if the requested service run on the same thread as the requester, then this will never happen because the node that needs to handle the request did not received yet the request, causing the thread that runs both services to get stuck, for more details see: Synchronous vs. asynchronous service clients — ROS 2 Documentation: Foxy documentation: [https://docs.ros.org/en/foxy/How-To-Guides/Sync-Vs-Async.html#id3]

This can be checked by adding a future status check with timeout regarding the future status, the code after the change: auto future = client->async_send_request(cmdrq); const auto futureStatus = future.wait_for(1s); if (futureStatus == std::future_status::ready) { auto response = future.get(); ret = response->success; }

I added such a code to both methods: set_message_interval_cb() and in autopilot_version_cb(), I found that the problem was due to autopilot_version_cb() that is called from a periodic timer, it get stuck and then the SystemStatusPlugin “sys” plugin services will not respond because the thread running the node’s spinner is stuck. After the change autopilot_version_cb() logs the error: VER: command plugin service call failed! Then if you try to use the set_mode service it responds.

The changes I did to the code are not a full solution to the problem, just a way to avoid this plugin to get stuck. I think that a full solution requires assuring that plugins using services from other plugins have spinners running on different threads, one way to achieve this is to use mutually exclusive callback groups, please see: Executors — ROS 2 Documentation: Foxy documentation [https://docs.ros.org/en/foxy/Concepts/About-Executors.html?highlight=callback%20groups] and the example: [examples/rclcpp/executors/cbg_executor]

vooon commented 2 years ago

@Michel1968 current ros2 branch has group set for command plugin. But all plugins share same multithreaded runner. Also each plugin is separate node, so i'm not quire sure that we can use sg of one node for clients from another one.

Wait_for looks more like a hack, as i suppose it still blocks. But definitely better than complete hung. Can you please make a PR?

Michel1968 commented 2 years ago

@voon

  1. When I mentioned callback groups I meant ROS2 callback groups that can be associated to executors and to specific subscriptions/publishers/services, not to Linux groups as it seem from your sg mention
  2. yes the wait_for() call is a hack just to avoid the spinner's thread to stuck. I just wanted to check my theory I did not intended to fix the problem.
  3. I'm afraid it might take me a few days before I can make a PR, meanwhile I'm attaching the source file I changed, it was taken from mavros version 2.0.3 sys_status.zip
rob-clarke commented 2 years ago

After playing around with this, I found I was able to reliably prevent the "cmd/command futures not returning" error by adding the autopilot_version_cb timer callback to the srv_cg callback group. That seemed to fix things even with the "hack" from 6e304258d7b289e4178cf6528fc1563760c1d612 removed. As to why that's the case I have no idea...

I wouldn't consider this entirely solved as issue https://github.com/ros2/rclcpp/issues/1707 suggests that calling a service from within a service callback isn't intended. In the future, PR https://github.com/ros2/rclcpp/pull/1709 may be of interest (now in rolling)

As for the "hack" it's probably good practice to timeout on service futures anyway as there are still ongoing efforts to improve service reliability especially around discovery at the RMW level (See https://github.com/ros2/rmw_fastrtps/issues/392 & https://github.com/ros2/rmw_cyclonedds/issues/191)