robotology / yarp

YARP - Yet Another Robot Platform
http://www.yarp.it
Other
523 stars 195 forks source link

yarp::os::NetworkClock should handle time reset #800

Closed traversaro closed 3 years ago

traversaro commented 8 years ago

When a simulator supporting synchronization with YARP is reset, the time published on the YARP_CLOCK port jumps backward in time to 0.0.

This happens for example in Gazebo if a "Reset World" is asked (for more on this, see https://github.com/robotology/gazebo-yarp-plugins/pull/247 , https://github.com/robotology/gazebo-yarp-plugins/issues/92 ).

yarp::os::NetworkClock should be modified to account for this, to avoid having the delay methods wait for an indefinite amount of time.

A possible approach is to store in the class the latest time read, and to return all the threads blocked on a delay method if the time goes backward.

Clearly, any software that is supposed to handle time reset should be carefully written to handle this, regardless of how we handle time reset in yarp::os::NetworkClock .

traversaro commented 8 years ago

cc @arocchi

traversaro commented 8 years ago

cc @barbalberto

barbalberto commented 8 years ago

Clearly, any software that is supposed to handle time reset should be carefully written to handle this, regardless of how we handle time reset in yarp::os::NetworkClock .

Yes, that's what I'm most worried about. I guess yarp modules currently 'consuming' the time info, like control algorithms, were not written keeping in mind the event of time jumping backward.

I agree with your proposal to wake up threads when time goes backward because this will prevent deadlocks in yarp threads, but then what happens next is up to the single module. In most of the cases their behaviour will be undefined.

What a module using a watchdog is supposed to do? What a control algorithm is supposed to do when the time jumps back in between of two following runs?

More often than not, if a module uses the timing information, it'll store 'previous run' time in a internal variable and take action based on time elapsed. All those algorithms will simply not work, so just wake up the threads with an invalid time information (because at the end of the day it is an invalid time value for the consumer) will trigger more unexpected consequences then not.

IMHO, the best thing to do will be: the clock notifies the yarp::RFModule about the clock reset event using a specific method. The user can implement this reset method and do all the thing he/she believes appropriate to handle the situation and keep working. The default implementation instead will be a fatal error closing down the RFModule and telling the user the module cannot go further because it does not make sense to continue. Either an asset (yFatal) or simply by calling the RFModule::close() for a clean close down.

virtual bool RFModule::reset(yarp::os::Stamp newTime)
{
   yError() << "Something nasty happened with clock. Quitting. 
   If you are brave enough to handle time travels, please implement this function in your RFModule()."
   close();
}
francesco-romano commented 8 years ago

What if the clock plugin simply saves the current time at the reset event, and continues to increment it after?, That is, the time is not reset

barbalberto commented 8 years ago

Yes, that's also a way but then there will be de-synch between yarp and ros clock plugins

traversaro commented 8 years ago

+1 to @barbalberto proposal (but I think it can be implemented separatly in the future).

lornat75 commented 8 years ago

I am not convinced. This solution solves the problem only for the RFModule and in addition it introduces a method which is meaningful only when using simulated clock.

traversaro commented 8 years ago

Another more general solution is to offer a callback for network reset in yarp::os::Time, but again are all solution that are quite advanced and that do not preclude to just unblock all the delay, that would at least fix the wrappers used by Gazebo to publish the sensor data.

barbalberto commented 8 years ago

@traversaro idea is probably more general.

Anyway yes, unblocking all the delays is the step zero to take. Actually it is the same thing we do when switching from system clock to network clock (which, by the way, implies the time goes backward) so maybe it could work well by his own.

lornat75 commented 8 years ago

+1

drdanz commented 8 years ago

Would it make sense to change delay from void to bool and return false on reset and on clock switch?

traversaro commented 7 years ago

@barbalberto just to clarify: if I understood correctly, https://github.com/robotology/yarp/pull/1277 has no changes related to this , right?

barbalberto commented 7 years ago

Correct, this is a desiderata, but with less priority. https://github.com/robotology/yarp/pull/1277 addresses most critical bugs, while adding new feature can be a further step. This is to avoid too much updates at the same time in a critical part as the clock is.

traversaro commented 3 years ago

As this was recently affected people using YARP_CLOCK in Gazebo simulations (@Giulero), I'll do a short recap of the welcome (correct me if I am wrong @drdanz) solutions, in order of complexity :

Both 1. and 2. are welcome solution, and both can be implemented, even in different PRs.

traversaro commented 3 years ago

@Giulero if you want to test, a basic implementation of 1. from previous comment is https://github.com/robotology/yarp/compare/master...traversaro:fix/800/1 .

drdanz commented 3 years ago

I'm not sure about the whole thing here. The system clock itself is not guaranteed to be monotonic, I wonder if when the system clock is adjusted, something similar should happen. Another idea would be to have a monotonic "SteadyNetworkClock" that somehow handles this.

traversaro commented 3 years ago

The system clock itself is not guaranteed to be monotonic, I wonder if when the system clock is adjusted, something similar should happen.

The main difference w.r.t. to the system clock, is that in the case of the SystemClock the now and the delay are not related in any way, so even if std::chrono::high_resolution_clock (used in now, see https://github.com/robotology/yarp/blob/9b52a71e176e23ebe42956cf5b23d8ae40b255f0/src/libYARP_os/src/yarp/os/SystemClock.cpp#L39) is non monotonic, then the delay uses std::this_thread::sleep_for (https://github.com/robotology/yarp/blob/9b52a71e176e23ebe42956cf5b23d8ae40b255f0/src/libYARP_os/src/yarp/os/SystemClock.cpp) that should work as expected regardless of the value of now. Instead, in the NetworkClock the delay has that specific implementation that is a function of the value read from the network clock, and for which the unblock logic shown in https://github.com/robotology/yarp/issues/800#issuecomment-775252095 is necessary.

Giulero commented 3 years ago

Thanks @traversaro! I tested your fix and it seems to work :)

drdanz commented 3 years ago

Fixed by @Giulero in #2494