Closed benjinne closed 1 year ago
Thanks for the bug report, @benjinne. I'm going to see if I can reproduce this. @MiguelCompany, have you seen anything like this?
@Barry-Xu-2018 @iuhilnehc-ynos any idea? sound really critical issue.
I can't reproduce this problem.
I tried using simple applications to get now() from a node with use_sim_time
enable, but it seems work as I expected.
# terminal 1 (add a test application that a publisher publishes '/clock' with `rosgraph_msgs.msg.Clock` starting from 1)
ros2 run demo_nodes_py timer
# terminal 2 (update the callback to call `this->now()`)
ros2 run demo_nodes_cpp talker --ros-args -p use_sim_time:=True
or
ros2 run demo_nodes_cpp talker ( use terminal 3 to call `ros2 param set /talker use_sim_time True` later )
# terminal 2 log
[INFO] [1664331704.865874420] [talker]: Publishing: 'Hello World: 1' time: 1664331704865865605
[INFO] [1664331705.865777557] [talker]: Publishing: 'Hello World: 2' time: 1664331705865774037
[INFO] [1664331706.865919796] [talker]: Publishing: 'Hello World: 3' time: 1664331706865913545
[INFO] [1664331707.865875875] [talker]: Publishing: 'Hello World: 4' time: 1664331707865871912
[INFO] [1664331708.865777110] [talker]: Publishing: 'Hello World: 5' time: 1664331708865773788
[INFO] [1664331709.865772521] [talker]: Publishing: 'Hello World: 6' time: 1664331709865769315
[INFO] [1664331710.866109847] [talker]: Publishing: 'Hello World: 7' time: 0
[INFO] [1664331711.865835457] [talker]: Publishing: 'Hello World: 8' time: 15000000000
[INFO] [1664331712.866043628] [talker]: Publishing: 'Hello World: 9' time: 16000000000
[INFO] [1664331713.865918251] [talker]: Publishing: 'Hello World: 10' time: 17000000000
...
[INFO] [1664331873.866060468] [talker]: Publishing: 'Hello World: 170' time: 177000000000 # check it more than 2 minutes
The backtrace correctly shows to get the time if using simulator time for the node. I have no idea what kind of situation could block calling the now().
```cpp
(gdb) bt
#0 rcl_get_ros_time (data=0x555555767760, current_time=0x7fffffffab68) at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rcl/rcl/src/rcl/time.c:71
ready to use `rcutils_atomic_load_uint64_t` get the `current` from `t->current_time`
#1 0x00007ffff6cabb9b in rcl_clock_get_now (clock=0x5555557676b0, time_point_value=0x7fffffffab68)
at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rcl/rcl/src/rcl/time.c:261
#2 0x00007ffff7bd32f7 in rclcpp::Clock::now (this=0x555555767690)
at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rclcpp/rclcpp/src/rclcpp/clock.cpp:74
#3 0x00007ffff7c3fe9a in rclcpp::Node::now (this=0x555555602c40)
at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rclcpp/rclcpp/src/rclcpp/node.cpp:544
#4 0x00007ffff4302dee in demo_nodes_cpp::Talker::Talker(rclcpp::NodeOptions const&)::{lambda()#1}::operator()() const (__closure=0x5555558a35c0)
at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/demos/demo_nodes_cpp/src/topics/talker.cpp:48
#5 0x00007ffff430dbb6 in rclcpp::GenericTimer
@audrow can you reproduce this issue?
I haven't been able to reproduce it. I tried for a good while yesterday.
Anyone have tips on how I can debug a callback to see why this->now() gets stuck and keeps repeating the same value after a few minutes? I'm able to debug it with gdb but I don't know what to look for.
@benjinne can you provide the reproducible test package on this?
I'm able to debug it with gdb but I don't know what to look for.
if you attach the gdb on that process or get the core file, how about checking the stack trace on that stuck thread?
@fujitatomoya I don't have a reproducible test package, but I was able to get the stacktrace of the node that has an issue with this->now() returning with the same repeated value even though the /clock topic is updating.
I think thread 12 is listening to the clock topic and enters deadlock somehow
Bug report
Required Info:
Steps to reproduce issue
After #619 and #633 merged into humble, my simulation became very unreliable. It worked well before that PR, so I'm wondering if anyone else is having issues with it.
I don't have a good example yet, but with
use_sim_time
I have some callbacks that eventually stop updating about a minute after launching. Other times, in my callbackthis->now()
freezes and stops updating even though the /clock topic is always updating. For example I have a callback from one subscription which transforms a message into another format then grabsthis->now()
and publishes it into a new topic. Since the time stops updating on the output of that callback, it breaks my simulation.I've tried using various callback groups and the multi-threaded executor, but nothing seems to help.
Expected behavior
this->now() with use_sim_time should always update and never freeze