Deterministic simulation, lockstep run of simulation and controllers

osrf-migration commented 4 years ago

Original report (archived issue) by Zbyněk Winkler (Bitbucket: Zbyněk Winkler (robotika)).

Our controller is strictly deterministic. When we have our binary log file we can replay what it did and get the exact same results (verified against the log). However the simulation environment is not deterministic which adds yet another challenge. The amount of non-determinism is significant to the point that it affects the final score. Take the final leaderboard for Tunnel Circuit from here: https://www.darpa.mil/news-events/2019-10-30. Only on worlds B and E we got the expected score distribution.

One steps towards being able to create and run deterministic tests would be the creation of something called lockstep mode where the simulation and the controller wait for each other. The PX4 team implemented this for gazebo 9 and made it the default for their tests. There are many advantages to this:

the results of the tests no longer depend on the speed of the computer running them or on the processes scheduler
when you attach debugger to the controller and step over your code, the simulation will wait so when you decide to continue, everything works just the same as if you never stopped the execution

My proposal is to implement this at least for local runs to ease development. Even better would be to implement this for the cloud simulation as well. This way we could side step the troubles with the speed of the AWS machines and make those runs deterministic as well.

osrf-migration commented 4 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).

set assignee_account_id to "557058:095b1e12-74ed-4e20-b44f-2f0745b616e0"
set assignee to "nkoenig (Bitbucket: nkoenig, GitHub: nkoenig)"

nkoenig commented 4 years ago

I believe we can add a lock-step mode for local testing and development.

mjcarroll commented 4 years ago

In the way that the physics and sensor measurements are generated in ignition-gazebo, we are already more "lock-step" than the previous versions of Gazebo9 and Gazebo11. Physics (and the rest of simulation) may be halted to guarantee that sensor data is generated from the correct physical state in the world.

The issue is once we have ROS involved. Currently, the bridge converts the ignition messages to ROS messages, but is not blocking any of the simulation waiting for a response. I believe that the current best approach to accomplish a lock-step with a controller would be to either:

Introduce an Ignition Gazebo system plugin version of your controller. This would probably take some rearchitecting of your control code, and would not be compatible with cloudsim, as we don't allow custom user code in the simulation instance.
Introduce a system plugin that blocks waiting for control input. Same as above, wouldn't work in cloudsim, and additionally would sink your simulation performance in most cases.

Do you think either of these routes would be sufficient for your testing/analysis?

osrf / subt

Deterministic simulation, lockstep run of simulation and controllers #257