ros-controls / ros_controllers

Generic robotic controllers to accompany ros_control
http://wiki.ros.org/ros_control
BSD 3-Clause "New" or "Revised" License
547 stars 525 forks source link

Issue with sample logic for joint_trajectory_controller in trajectory_interface.h #143

Open dcconner opened 9 years ago

dcconner commented 9 years ago

https://github.com/ros-controls/ros_controllers/blob/indigo-devel/joint_trajectory_controller/include/trajectory_interface/trajectory_interface.h#L138

If the findSegment returns end(), which it does for time preceding trajectory, the function returns it; for it = trajectory.end(). This causes error in https://github.com/ros-controls/ros_controllers/blob/indigo-devel/joint_trajectory_controller/include/joint_trajectory_controller/joint_trajectory_controller_impl.h#L398

1) For preceding time should we add return trajectory.front(); at L150 of trajectory_interface.h? 2) Given operation of findSegment, how do we differentiate between a time preceding segment, and an actual failure to maintain a valid stopHold position? The logic in sample, appears to assume preceding time and cause jump in state from end to beginning.

adolfo-rt commented 9 years ago

The implemented behavior seems correct to me. The active trajectory segment is the one whose:

For segments that are not the last one, the sample time will belong to the active segment's [start_time, end_time) interval. For the last trajectory segment, we allow to sample past the waypoint's end_time. The existing segment type implements a hold last position policy after end_time.

In the joint_trajectory_controller we do not allow extrapolating into the past. There always has to be a valid segment for the current time, at least a hold position one. When there is none (eg. they all live in the future), then there's a bug somewhere, hence the error you reference:

https://github.com/ros-controls/ros_controllers/blob/indigo-devel/joint_trajectory_controller/include/joint_trajectory_controller/joint_trajectory_controller_impl.h#L398

If you're getting this error, there's likely something wrong in the controller logic. From a private conversation we had, and some experimentation I've been doing, it might be the case that some corner cases are not well handled.

dcconner commented 9 years ago

We'll be on alert for this issue with real robot; thus far I've only noticed in simulation.

adolfo-rt commented 9 years ago

145 should partially address this issue. The controller was not rejecting commands that violate the strictly-increasing requirement for waypoint times. When this happens, instead of:

Unexpected error: No trajectory defined at current time. Please contact the package maintainer.

which took place in the realtime controller update (bad), the offending trajectory command will be rejected in the non real-time callback processing (good), with an error stating:

Trajectory message contains waypoints that are not strictly increasing in time.

@dcconner mentioned offline that he was using rqt_joint_trajectory_controller and Gazebo, and that the reported error had an occurrence of once every 10-20min. The second part of this issue might be to look into the rqt plugin.

I noticed that the rqt plugin is using qt (wall-clock) timers, while the control loop runs on the simulation clock (which can skew arbitrarily). Making the plugin use rospy (hence simulation clock-driven) timers could be the missing piece.

davetcoleman commented 8 years ago

Hi guys, I'm having this issue myself and have been struggling with it for a while. I'm interrupting the joint_trajectory_controller with a new trajectory every ~30hz. It works great except 5% of the time I get the Unexpected error: No trajectory defined at.... I've read the code and your comments here but still am unsure how this could happen. Here's some relevant console output (warning I kinda start talking out loud, sorry for the length):

DEBUG : Recieved new action goal
DEBUG : Figuring out new trajectory starting at time 1446507717.012
DEBUG : Using alternate time base. In it, the new trajectory starts at time 47.950
DEBUG : Trajectory has 16 segments:
- 1 segment(s) will still be executed from previous trajectory.
- 1 segment added for transitioning between the current trajectory and first point of the input message.
- 14 new segments (15 points) taken from the input trajectory.

ERROR : Unexpected error: No trajectory defined at current time. Please contact the package maintainer.

DEBUG : Recieved new action goal
DEBUG : Figuring out new trajectory starting at time 1446507717.042
DEBUG : Using alternate time base. In it, the new trajectory starts at time 47.980
DEBUG : Trajectory has 15 segments:
- 1 segment(s) will still be executed from previous trajectory.
- 1 segment added for transitioning between the current trajectory and first point of the input message.
- 13 new segments (14 points) taken from the input trajectory.

And from my trajectory generation node (roughly MoveIt!-based) this is the start of the previous trajectory:

seq: 0
stamp: 1446507716.951572017
frame_id: /world
positions[0]
positions[]
  ...
velocities[]
  ...
accelerations[]
  ...
effort[]
time_from_start: 0.075434335

Then I send this trajectory and the error occurs:

seq: 0
stamp: 1446507717.011822401
frame_id: /world
positions[0]:
positions[]
  ...
velocities[]
  ...
accelerations[]
  ...
effort[]
time_from_start: 0.071915017

I'm not 100% clear on how the trajectory controller chooses which points to keep, but if its header.stamp + point[0].time_from_start than the first trajectories are scheduled to start at:

1446507717.03 // first trajectory
1446507717.08 // second trajectory

According to the "Alternative time base" they are also increasing:

the new trajectory starts at time 47.950
// ERROR
...
the new trajectory starts at time 47.980

That's a time diff of

0.30 s, or 3.33 hz

And my controller loop is 200hz. I don't understand

otamachan commented 8 years ago

Hi, I have a similar problem. When I send trajectories to the command topic continuously, Unexpected error: No trajectory defined at... occurs occasionally.

I'm using controller_manager in a real-time loop. It looks like something like this.

controller_manager::ControllerManager cm(&hw, nh);
ros::Rate rate(100.0);
ros::Time last = ros::Time::now();
rate.sleep();
while (ros::ok())
{
  ros::Time start = ros::Time::now();
  ros::Duration period = start - last;
  last = start;
  //hw.read();
  cm.update(start, period);
  //hw.write();
  rate.sleep();
}

The period passed to the controller manager is calculated every time, and so it has jitter.

I looked at the joint_trajectory_controller code and I found that the start time of the first trajectory segment is calculated using the previous period. (https://github.com/ros-controls/ros_controllers/blob/indigo-devel/joint_trajectory_controller/include/joint_trajectory_controller/joint_trajectory_controller_impl.h#L471). This might cause that the start time of the first trajectory is later than the next sample time when the previous period is longer(even 1us) than the current period because of the jitter. And this leads to the above error.

I don't see the correct solution of this problem, but I'm handling this problem by allowing a future start time if the difference is small.(https://github.com/otamachan/ros_controllers/commit/d3233bc05e573081df4904aac63e0dd4eda1ffd4)

I would appreciate any advice or better solution.

davetcoleman commented 8 years ago

This is working really well for me, I've gotten no errors so far!!

matthew-reynolds commented 5 years ago

Pardon the mess of references above, had some configuration issues with our fork.

It's been a few years without much word on updates to this issue, so when I ran into similar behaviour of inconsistent timing of my control loop causing trajectories to fail, I came across this issue. Adapting @otamachan's solution to work with Kinetic, I found that the issue was greatly reduced, but not completely eliminated. This was due to the limitation of the solution that otamachan pointed out, which is that the allowance of a slightly future start time was based off of the current period, which I believe is insufficient. Allow me to explain:

As otamachan described, the issue arrises when the current period is shorter than the previous period. In my case, with a 10hz clock, if something takes longer than it should in the control loop (Separate issue), I can run into cases where my period ends up being, for example, 0.12sec instead of 0.10. This causes the next period to be short, 0.08sec, to compensate and maintain an average of 10hz. This issue here is that using the current period to determine an appropriate acceptable future start time is insufficient, since the current period is much shorter than the average period and therefore not a good threshold.

I don't have a thorough enough understanding of ros controller's trajectories to know what the appropriate solution is, so I was hoping to revive this issue to discuss and potentially solve this issue. In my fork, I'm using the difference between the current and previous periods as my threshold for allowing future start times. This is working better for me than otamachan's solution, but I am unsure if it is really the 'correct' solution. Perhaps using the average period would be better than the period difference? Would appreciate some input and assistance.

bmagyar commented 5 years ago

First of all, thanks for reviving it :) Second, can you come up with a set of unit tests that reproduce the issue? That'd be probably the cleanest way of solving it. A 10Hz loop is easy to fake going off-trail from a unit test too. I am aware that this has been an issue for a long time but as it was a relatively harmless one for most platforms, it didn't gain a high severity.

In terms of solutions I think you are on track, perhaps @beatrizleon has encountered the same issue before?

beatrizleon commented 5 years ago

Sorry @bmagyar, no, as far as I know we haven't encountered the same issue before.

matthew-reynolds commented 5 years ago

@bmagyar I'm new to unit tests with ROS but I've now setup a rostest-gtest that is able to reliably reproduce the issue (See here). My problem, however, is then causing the unit test to capture this issue and fail appropriately. There's no way to detect this issue occurring using the action client, the only indication of an error is the lack of updated command in the controller due to its update being cut off.

My test follows the exact same format as joint_trajectory_controller.test and joint_trajectory_controller_wrapping.test, with a custom RobotHW node (non-gtest) and then the gtest node interacting with the controller manager through the action server.

I'm not sure if there's a good way to use the test as intended to detect the error, but it is at least a tool to reproduce the error.

mferenduros commented 5 years ago

We just ran into this too. We're submitting new trajectory goals at around 30-60Hz, and under the right conditions the joint_trajectory_controller will completely black out for a second or more. No goal feedback is emitted and the robot continues on with whatever joint velocities it had the last time that update() ran successfully, sometimes with very very bad consequences for the robot and/or the box it's trying to pick up.

@Matthew-Reynolds Your fix looks good, does it continue to behave well for you?

bmagyar commented 5 years ago

@Matthew-Reynolds could you please submit a PR with your test as first commit, then we can iterate on reviewing a fix there?

matthew-reynolds commented 5 years ago

@mferenduros Last I heard it seemed to be doing well, I'm not involved with that project anymore though so I'll confirm and get back to you.

@bmagyar I haven't looked at this for several months so I will need to spend a few days to get back up to speed. As I recall, though, the issue I was having was with actually detecting the issue. Nothing is reported back to the action client, the only ways to notice the issue are by seeing the error message in the log or by noticing that the command isn't updated. Will get back up to speed and post an update in a few days.

skohlbr commented 4 years ago

Searched for the error message and found this issue (again). Also experiencing this on 16.04/Kinetic when sending goals to controller at high frequency (>15Hz). Didn't observe major adverse effects so far, however.

matthew-reynolds commented 3 years ago

I totally dropped the ball on this.

The project I was a part of in 2019 has been using https://github.com/ros-controls/ros_controllers/commit/314c63a72b41c66bbad59d2991482f5eca29aafd for almost 3 years now without issue. Admittedly, some other changes have occured in the system which regularized the control loop period, but as I recall that patch was sufficient to avoid the issue even with the old irregular period.

@bmagyar Where do you want to go from here? In the past few years I've gotten more savvy with gtest, so perhaps I should give another crack at writing a test to reproduce this.

mzucker commented 2 years ago

I'm not sure if this is the same issue, but I wrote a test program that reliably repeats the

Unexpected error: No trajectory defined at current time. Please contact the package maintainer.

error message when run against the https://github.com/PickNikRobotics/ros_control_boilerplate simulator. The issue and steps to reproduce are described at https://github.com/PickNikRobotics/ros_control_boilerplate/issues/56.

It would be great to find out more about this error and how to avoid it.

mzucker commented 2 years ago

Over in https://github.com/PickNikRobotics/ros_control_boilerplate/issues/56#issuecomment-948564075, @RobertWilbrandt confirms independently that my code can trigger this bug, if it's any help.

aaryanmurgunde commented 1 year ago

I know I'm several years late to this party but has anyone found a solution to this ?