Closed traversaro closed 3 years ago
cc @arocchi
cc @barbalberto
Clearly, any software that is supposed to handle time reset should be carefully written to handle this, regardless of how we handle time reset in yarp::os::NetworkClock .
Yes, that's what I'm most worried about. I guess yarp modules currently 'consuming' the time info, like control algorithms, were not written keeping in mind the event of time jumping backward.
I agree with your proposal to wake up threads when time goes backward because this will prevent deadlocks in yarp threads, but then what happens next is up to the single module. In most of the cases their behaviour will be undefined.
What a module using a watchdog is supposed to do? What a control algorithm is supposed to do when the time jumps back in between of two following runs?
More often than not, if a module uses the timing information, it'll store 'previous run' time in a internal variable and take action based on time elapsed. All those algorithms will simply not work, so just wake up the threads with an invalid time information (because at the end of the day it is an invalid time value for the consumer) will trigger more unexpected consequences then not.
IMHO, the best thing to do will be:
the clock notifies the yarp::RFModule about the clock reset
event using a specific method. The user can implement this reset
method and do all the thing he/she believes appropriate to handle the situation and keep working.
The default implementation instead will be a fatal error closing down the RFModule
and telling the user the module cannot go further because it does not make sense to continue.
Either an asset (yFatal) or simply by calling the RFModule::close()
for a clean close down.
virtual bool RFModule::reset(yarp::os::Stamp newTime)
{
yError() << "Something nasty happened with clock. Quitting.
If you are brave enough to handle time travels, please implement this function in your RFModule()."
close();
}
What if the clock plugin simply saves the current time at the reset event, and continues to increment it after?, That is, the time is not reset
Yes, that's also a way but then there will be de-synch between yarp and ros clock plugins
+1 to @barbalberto proposal (but I think it can be implemented separatly in the future).
I am not convinced. This solution solves the problem only for the RFModule and in addition it introduces a method which is meaningful only when using simulated clock.
Another more general solution is to offer a callback for network reset in yarp::os::Time
, but again are all solution that are quite advanced and that do not preclude to just unblock all the delay, that would at least fix the wrappers used by Gazebo to publish the sensor data.
@traversaro idea is probably more general.
Anyway yes, unblocking all the delays is the step zero to take. Actually it is the same thing we do when switching from system clock to network clock (which, by the way, implies the time goes backward) so maybe it could work well by his own.
+1
Would it make sense to change delay
from void
to bool
and return false on reset and on clock switch?
@barbalberto just to clarify: if I understood correctly, https://github.com/robotology/yarp/pull/1277 has no changes related to this , right?
Correct, this is a desiderata, but with less priority. https://github.com/robotology/yarp/pull/1277 addresses most critical bugs, while adding new feature can be a further step. This is to avoid too much updates at the same time in a critical part as the clock is.
As this was recently affected people using YARP_CLOCK in Gazebo simulations (@Giulero), I'll do a short recap of the welcome (correct me if I am wrong @drdanz) solutions, in order of complexity :
delay
should be unblocked, to avoid the mysterious phenomena of threads suddenly stopping and resuming after the time at which the clock was previously reset has passedBoth 1. and 2. are welcome solution, and both can be implemented, even in different PRs.
@Giulero if you want to test, a basic implementation of 1. from previous comment is https://github.com/robotology/yarp/compare/master...traversaro:fix/800/1 .
I'm not sure about the whole thing here. The system clock itself is not guaranteed to be monotonic, I wonder if when the system clock is adjusted, something similar should happen. Another idea would be to have a monotonic "SteadyNetworkClock" that somehow handles this.
The system clock itself is not guaranteed to be monotonic, I wonder if when the system clock is adjusted, something similar should happen.
The main difference w.r.t. to the system clock, is that in the case of the SystemClock the now and the delay are not related in any way, so even if std::chrono::high_resolution_clock
(used in now, see https://github.com/robotology/yarp/blob/9b52a71e176e23ebe42956cf5b23d8ae40b255f0/src/libYARP_os/src/yarp/os/SystemClock.cpp#L39) is non monotonic, then the delay uses std::this_thread::sleep_for
(https://github.com/robotology/yarp/blob/9b52a71e176e23ebe42956cf5b23d8ae40b255f0/src/libYARP_os/src/yarp/os/SystemClock.cpp) that should work as expected regardless of the value of now. Instead, in the NetworkClock the delay has that specific implementation that is a function of the value read from the network clock, and for which the unblock logic shown in https://github.com/robotology/yarp/issues/800#issuecomment-775252095 is necessary.
Thanks @traversaro! I tested your fix and it seems to work :)
Fixed by @Giulero in #2494
When a simulator supporting synchronization with YARP is reset, the time published on the
YARP_CLOCK
port jumps backward in time to0.0
.This happens for example in Gazebo if a "Reset World" is asked (for more on this, see https://github.com/robotology/gazebo-yarp-plugins/pull/247 , https://github.com/robotology/gazebo-yarp-plugins/issues/92 ).
yarp::os::NetworkClock
should be modified to account for this, to avoid having thedelay
methods wait for an indefinite amount of time.A possible approach is to store in the class the latest time read, and to return all the threads blocked on a
delay
method if the time goes backward.Clearly, any software that is supposed to handle time reset should be carefully written to handle this, regardless of how we handle time reset in
yarp::os::NetworkClock
.