Open blakermchale opened 3 years ago
I seen that service call problems, but don't really know what's wrong. It could be very random, any plugin can become silent. I also seen that sometimes topics also can stop working. I suspect that problem related with networking, but i blamed docker...
Note that in mavros v2 each plugin has a separate node instance, so on graph you'll see lots of nodes like <uas>.<plugin>
.
That fact likely resulting in that only some plugins become silent.
Diagnostics exists, just command should be like ros2 topic echo /diagnostics
.
Check ID - ros2 run mavros mav checkid
.
I added diagnostics and checkid to my initial post. I also noticed that checkid does not support namespacing outside of the default.
The topics for sys
all work fine for publishing at least. I tried sending a message to /mavros/statustext/send
and found the callback was never called, similar to the services.
Try to use that option: https://github.com/mavlink/mavros/blob/f1dad6c21b767ef2d3e1bff57952defc318c49bf/mavros/mavros/cmd/__init__.py#L76-L80
Note that it should be passed to mav
(that's how click works...). E.g. ros2 run mavros mav --mavros-ns /some/uas2 checkid
.
Since services build on top of DDS topics, i assume that we would have problems with both subscribers and service servers. In simplification client sends request to request topic, which service server is subscribed to, and waits a response on response topic, which newer came.
I saw problems even with ros2 topic list
ain't showing anything, but still suppose that there may be problems with udp multicast in my network / docker network.
I don't do much on Galactic, where they switched default to another DDS impl. Maybe it's worth to try?
Thanks for that info. I actually found that passing a namespace to mavros_node
does not append a a namespace to the uas
topics. For example, these are the topics I see for uas
even though all other mavros
topics are /drone_0/mavros/...
:
/uas1/mavlink_sink
/uas1/mavlink_source
Even with using the --mavros-ns
argument it still doesn't work with namespaces.
I've been working in Foxy mostly but tried Galactic today and it did not seem to change the results. I still couldn't get the service callback to work.
Ah, yes, that's because: https://github.com/mavlink/mavros/blob/f1dad6c21b767ef2d3e1bff57952defc318c49bf/mavros/src/mavros_node.cpp#L50
But with multi-drone scenario you don't have to use mavros_node. You need to start one Router (read ros2 composite nodes tutorial), and instruct it to connect to drones and uas.
mavros_router:
ros__parameters:
fcu_urls:
- udp://:14550@
- udp://:14551@
uas_urls:
- /uas1
- /uas2
Then you can run 2 (or more) UAS with different names (namespaces). Each of them need to have it's own uas link.
Please note that you shuld use different system ids on each system, otherwise you may get wide list of problems.
Hi! @vooon @blakermchale . I'm facing the same issue on ROS2 Galactic. Have you made any progress on this?
@orchydea I haven't found any solution yet and have primarily used RTPS instead as a replacement to mavros
That's a shame @blakermchale. I will try using RTPS or ros1_brige
. I ran mavros
on noetic and this issue does not seem to happen.
Thanks!
+1 I'm having this issue on ROS2 Foxy as well. Might have to figure out the ros1_bridge
...
@IanBurwell I ended up porting everything to ROS1 (Noetic). I would recommend to do so if your project is at an early stage. It's a pity that this does not work on ROS2 :cry:
I've possibly found the workaround of just sending raw MavLink commands, such as the following to force RTL. This is for sure a workaround but I am committed to using ROS2. Worst comes to worst I set up ros1_bridge
.
ros2 service call /mavros/cmd/command mavros_msgs/srv/CommandLong "{broadcast: false, command: 176, confirmation: 0, param1: 157, param2: 4, param3: 5, param4: 0.0, param5: 0.0, param6: 0.0, param7: 0.0}"
I'm facing the same problem, Ros2 Foxy. when publishing to the topic /mavros/rc/override, the topic of /mavros/rc/override updated but /mavros/rc/in or out doesn't. The plugin rc_io created, added to executor and initialized. Do you have an idea how to fix that?
Nope. But i'm thinking to try use separate executor for each plugin.
I have an issue which I think is related to this one
When doing ros2 param list
after MAVROS2 booted completely/correctly I have an error AttributeError: 'NoneType' object has no attribute 'result'
If it's unrelated I can open a separate issue.
More details of the command output:
# ros2 param list
/mavros:
component_id
diagnostic_updater.period
fcu_protocol
plugin_allowlist
plugin_denylist
qos_overrides./parameter_events.publisher.depth
qos_overrides./parameter_events.publisher.durability
qos_overrides./parameter_events.publisher.history
qos_overrides./parameter_events.publisher.reliability
qos_overrides./tf.publisher.depth
qos_overrides./tf.publisher.durability
qos_overrides./tf.publisher.history
qos_overrides./tf.publisher.reliability
qos_overrides./tf_static.publisher.depth
qos_overrides./tf_static.publisher.history
qos_overrides./tf_static.publisher.reliability
system_id
target_component_id
target_system_id
uas_url
use_sim_time
/mavros_node:
fcu_url
gcs_url
qos_overrides./parameter_events.publisher.depth
qos_overrides./parameter_events.publisher.durability
qos_overrides./parameter_events.publisher.history
qos_overrides./parameter_events.publisher.reliability
tgt_component
tgt_system
use_sim_time
/mavros_router:
diagnostic_updater.period
fcu_urls
gcs_urls
qos_overrides./parameter_events.publisher.depth
qos_overrides./parameter_events.publisher.durability
qos_overrides./parameter_events.publisher.history
qos_overrides./parameter_events.publisher.reliability
uas_urls
use_sim_time
Exception while calling service of node '/mavros/sys': None
Traceback (most recent call last):
File "/opt/ros/galactic/bin/ros2", line 11, in <module>
load_entry_point('ros2cli==0.13.1', 'console_scripts', 'ros2')()
File "/opt/ros/galactic/lib/python3.8/site-packages/ros2cli/cli.py", line 67, in main
rc = extension.main(parser=parser, args=args)
File "/opt/ros/galactic/lib/python3.8/site-packages/ros2param/command/param.py", line 39, in main
return extension.main(args=args)
File "/opt/ros/galactic/lib/python3.8/site-packages/ros2param/verb/list.py", line 116, in main
sorted_names = sorted(response.result.names)
AttributeError: 'NoneType' object has no attribute 'result'
@vooon
Nope. But i'm thinking to try use separate executor for each plugin.
Did you have any success doing this? Could this be achieved by converting the sys status plugin into its own node and adding it to its own executor?
@Zaaler sorry, i hadn't had time to try that yet.
No, it does not have any sense to strip out any plugin. By design they must be loaded to UAS node, which provides some shared state and other abstractions.
I'm having the same problem. mavros
running on Foxy in a Docker container, latest build from source.
FROM ros:foxy-ros-core-focal
RUN apt-get update && apt-get install -qy --no-install-recommends \
python3-colcon-common-extensions \
build-essential gcc git \
iputils-ping \
geographiclib-tools \
python3-vcstool python3-rosinstall-generator python3-osrf-pycommon python3-rosdep
SHELL ["/bin/bash", "-c"]
RUN mkdir -p /ros2_ws/src
WORKDIR /ros2_ws
RUN rosdep init && rosdep update
# install mavros from source
RUN git config --global advice.detachedHead false
RUN rosinstall_generator --format repos mavlink | tee /tmp/mavlink.repos && \
rosinstall_generator --format repos --upstream-development mavros | tee -a /tmp/mavros.repos && \
vcs import src < /tmp/mavlink.repos && \
vcs import src < /tmp/mavros.repos && \
rosdep install --from-paths src --ignore-src -y && \
./src/mavros/mavros/scripts/install_geographiclib_datasets.sh
RUN /bin/bash -c "source /opt/ros/foxy/setup.bash && colcon build"
# ADD ./src /ros2_ws/src
RUN /bin/bash -c "source /opt/ros/foxy/setup.bash && colcon build"
ADD ./params.yaml .
ENV ROS_DOMAIN_ID=42
CMD /bin/bash -c "source install/setup.bash && ros2 run mavros mavros_node --ros-args -r __ns:=/copter --params-file params.yaml"
params.yaml
contents:
/**:
ros__parameters:
fcu_url: udp://copter:14551@iris:14555
Simulated vehicle is running ArduCopter-4.1 SITL in a separate iris
container.
Subscribing to global_position
and imu
works fine.
I can arm the vehicle by calling cmd/arming
.
But set_mode
never returns.
Are there any configurations for which this does work? Or is this just broken for now?
I've been debugging. If you comment out any callback that makes a client to the cmd/command service name, the system works fine. Those two callbacks are the autopilot_version_cb and the set_message_interval_cb. I plan on continuing to debug it hopefully today. I would assume any plugin that creates a client with cmd/command will cause that plugin to start to hang. From a grep, I found that includes the home_position, sys_status, and mount_control plugins in cpp.
Maybe @vooon has some insight from this information. It does sound like those plugins are failing to spin which is causing the client to never receive the response. From placing a print in the cmd/command service, I know that service is processing the request and attempting to send back a response. Just seems like the client end isn't continuing to try and receive it.
Any thoughts? Also, where does this issue's importance currently rank in the ROS2 development of MAVROS?
Thanks
Maybe i messed up with spinning futures? Unsure. Also in most recent versions of ROS2 it's possible to write async service server, which may be helpful for services like message_interval, which clearly translates to command_long.
That bug is a show stopper. So yes, it's important :)
@Zaaler i'm not sure what i've changed, but now i can receive topics (status, diag, imu), call some of services (e.g. params pull).
Sometimes i have troubles with /mavros/set_mode
using ros2 service call
, but now it works fine with mav sys mode
.
@vooon I see that the mavros node can be brought up and some of the services can be used now. However, if I bring up the node, and call the set_message_interval service, it hangs still. Once the hanging occurs, all other services can no longer be called like set_mode. Situation seems to persist.
Are the times that you have trouble using /mavros/set_mode after you tried to call a service that doesn't return?
Yes, i tried to use mav sys
after hanging (and killing) ros2 service
.
But i'm unsure whether it be related to amount of runner threads or not. In my case executor uses 8 threads.
I've added thread count report in startup.
I'm still unsure how properly use futures because rclcpp2::spin_future...() throws error saying that the node is already spinning (by UAS::exec, yes).
I found this issue after trying to use the set_mode service and seeing no service responds.
After reading @Zaaler comments and @voon responses, I checked the sys_status.cpp plugin code (ros2 version 2.0.3) and found that in set_message_interval_cb() and in autopilot_version_cb() an asynchronous service request is done to "cmd/command" service and then the code waits for the future response, I talking about code like this: (lines 1083- 1085 on sys_status.cpp) auto future = client->async_send_request(cmdrq); auto response = future.get(); ret = response->success;
I think there is problem with the future.get() : if the requested service run on the same thread as the requester, then this will never happen because the node that needs to handle the request did not received yet the request, causing the thread that runs both services to get stuck, for more details see: Synchronous vs. asynchronous service clients — ROS 2 Documentation: Foxy documentation: [https://docs.ros.org/en/foxy/How-To-Guides/Sync-Vs-Async.html#id3]
This can be checked by adding a future status check with timeout regarding the future status, the code after the change: auto future = client->async_send_request(cmdrq); const auto futureStatus = future.wait_for(1s); if (futureStatus == std::future_status::ready) { auto response = future.get(); ret = response->success; }
I added such a code to both methods: set_message_interval_cb() and in autopilot_version_cb(), I found that the problem was due to autopilot_version_cb() that is called from a periodic timer, it get stuck and then the SystemStatusPlugin “sys” plugin services will not respond because the thread running the node’s spinner is stuck. After the change autopilot_version_cb() logs the error: VER: command plugin service call failed! Then if you try to use the set_mode service it responds.
The changes I did to the code are not a full solution to the problem, just a way to avoid this plugin to get stuck. I think that a full solution requires assuring that plugins using services from other plugins have spinners running on different threads, one way to achieve this is to use mutually exclusive callback groups, please see: Executors — ROS 2 Documentation: Foxy documentation [https://docs.ros.org/en/foxy/Concepts/About-Executors.html?highlight=callback%20groups] and the example: [examples/rclcpp/executors/cbg_executor]
@Michel1968 current ros2 branch has group set for command plugin. But all plugins share same multithreaded runner. Also each plugin is separate node, so i'm not quire sure that we can use sg of one node for clients from another one.
Wait_for looks more like a hack, as i suppose it still blocks. But definitely better than complete hung. Can you please make a PR?
@voon
After playing around with this, I found I was able to reliably prevent the "cmd/command
futures not returning" error by adding the autopilot_version_cb
timer callback to the srv_cg
callback group. That seemed to fix things even with the "hack" from 6e304258d7b289e4178cf6528fc1563760c1d612 removed. As to why that's the case I have no idea...
I wouldn't consider this entirely solved as issue https://github.com/ros2/rclcpp/issues/1707 suggests that calling a service from within a service callback isn't intended. In the future, PR https://github.com/ros2/rclcpp/pull/1709 may be of interest (now in rolling)
As for the "hack" it's probably good practice to timeout on service futures anyway as there are still ongoing efforts to improve service reliability especially around discovery at the RMW level (See https://github.com/ros2/rmw_fastrtps/issues/392 & https://github.com/ros2/rmw_cyclonedds/issues/191)
Issue details
I've built
mavros
in ROS Foxy and am running it usingmavros_node
. Services like~/arming
work for thecommand
plugin. However, for thesys_status
plugin I cannot call any of the services likeset_mode
orvehicle_info_get
. I am using the command:This service call ends up hanging and never returning. It just says this message:
I've added in rclcpp log info messages to the service callbacks and determined they are never being entered since the messages are not logged.
MAVROS version and platform
Mavros: 2.0.3 ROS: Foxy Ubuntu: 20.04
Autopilot type and version
[ ] ArduPilot [X] PX4
Version: master or 9d3e18dbe10700a13f3048f2807dc4b1a62eee62
Node logs
Diagnostics
Check ID