ros / roscpp_core

ros distribution sandbox
89 stars 116 forks source link

[rostime] add SteadyTime #57

Closed flixr closed 7 years ago

flixr commented 7 years ago

Add a new SteadyTime wich uses a steady/monotonic clock to prevent issues with time jumps.

Replacement for #55 with MonotonicTime renamed to SteadyTime and added Windows implementation.

flixr commented 7 years ago

I didn't want to use the C++11 features here as I'm not sure if we can/should use them for roscpp yet. And I'm hoping this will also applied to indigo (we are currently stuck on Ubuntu 14.04 on the Nvidia Jetson, as there are no cuda drivers for newer Ubuntu versions), where a C++11 compatible compiler is not yet required.

flixr commented 7 years ago

Fixed the whitespace stuff... will look into creating unit tests, although I have no idea on how to check if time is really monotonic, since I don't think we can reset the system time in the tests...

flixr commented 7 years ago

Anything else to do here?

flixr commented 7 years ago

Is it OK to use WallDuration with SteadyTime or should we add a SteadyDuration as well? There would be no difference between WallDuration and SteadyDuration though...

dirk-thomas commented 7 years ago

I think most importantly this needs to be tested in context of the full stack up to bond_core to make sure it satisfies the desired use case. @mikaelarguedas FYI

mikaelarguedas commented 7 years ago

I tested the changes to use SteadyTimer all the way up to bond_core. It works as expected (bonds don't break on forward time jumps, but if you jump back in time the bond will stay alive until the deadline is reached). I did notice a few unexpected nodelet crashes when using this so I'll do some more testing on Monday before +1 ing this.

mikaelarguedas commented 7 years ago

Update: Looks like there are still situations where time jumps cause this to fail:

[DEBUG] [1490639010.653929481]: SteadyTimer deregistering callbacks.
[DEBUG] [1490643600.000426555]: Time jumped forward by [25.143854] for timer of period [1.000000], resetting timer (current=1524518.175054, next_expected=1524493.031200)
[DEBUG] [1490643600.000492923]: Time jumped forward by [25.042610] for timer of period [1.000000], resetting timer (current=1524518.175106, next_expected=1524493.132497)
[ INFO] [1490643600.087787003]: Bond broken, exiting

It shows up only with nodelets but not with simple processes tied together with a bond. Needs more investigation.

Can be reproduced by running:

roslaunch nodelet_tutorial_math plus.launch

and jumping system time forward:

sudo date --set="2017-03-28 14:00:00.000"
flixr commented 7 years ago

@mikaelarguedas thanks a lot for testing!

The SteadyTime class of this PR should be ok, but I added a fix for the SteadyTimer in https://github.com/ros/ros_comm/pull/1014

flixr commented 7 years ago

@mikaelarguedas any updates here? We have these 3 patchsets running here on all our devices and haven't encountered any problems so far.

mikaelarguedas commented 7 years ago

@flixr Sorry didn't have time to do another round of testing on this. Did you perform tests with nodelets as well or only on simple bond connection ? (IIRC I faced problem only with nodelets and multiple forward time jumps)

I'll try to test this today or tomorrow and will report here

flixr commented 7 years ago

I tested the SteadyTimer callbacks directly and nodelets...

mikaelarguedas commented 7 years ago

I tested the same here (cb, bonds only and nodelets) and couldn't make if fail. :clap: :tada: thanks for pounding on this that's a great improvement! :rocket: @dirk-thomas What is the porting strategy you consider here ? Do you consider back-porting this to all active distros ?

dirk-thomas commented 7 years ago

The change in this repo is easier since it only adds API. The PRs in other repos building on top of this are more challenging since they change behavior. Therefore I plan to roll them out distro by distro. So first Lunar and Kinetic, waiting for PRs on top to also be released, wait until all changes are in the public repo and let it soak there for some time. Then consider doing the backports for Indigo and Jade.

flixr commented 7 years ago

@dirk-thomas I would propose to merge this as is... If you want further changes in the windows implementation, it should IMHO be done in a separate PR as the same should be applied to the already existing ros_walltime function and tested by someone on Windows...

flixr commented 7 years ago

@dirk-thomas any updates here? I can't really test the windows code... but we have these patches running on our systems for a while now without any problems, so is there any reason to not merge this yet?

flixr commented 7 years ago

@dirk-thomas @mikaelarguedas ping...

dirk-thomas commented 7 years ago

Sorry for the delay - very busy times :worried:

I will merge this and release a new version into Lunar in order to test the ros_comm PR.

dirk-thomas commented 7 years ago

@flixr Thank you for working on this feature!