ros-simulation / gazebo_ros_pkgs

Wrappers, tools and additional API's for using ROS with Gazebo
http://wiki.ros.org/gazebo_ros_pkgs
779 stars 773 forks source link

[gazebo_ros]: 'Step control' of ROS-Gazebo for ROS-based Reinforcement Learning #1268

Open alikureishy opened 3 years ago

alikureishy commented 3 years ago

Description This is a feature request for supporting step control of Gazebo, via ROS. Gazebo independently offers step control via the gazebo::msgs::WorldControl message, but that functionality is not presently exposed to ROS nodes via the gazebo_ros_api plugin.

Motivation: This is mostly in the context of ROS-based machine-learning systems (specifically reinforcement learning, but applicable generally to ML as well). In the case of RL, AI agents must be able to associate actions with their consequent observations and rewards, as part of the reward-based feedback loop. However, since Gazebo and ROS (indeed all nodes within ROS) operate asynchronously with respect to each other, it is presently not possible to draw time-synchronized or time-quantized action-state-reward associations, which impacts both training and inference stages of the RL system.

alikureishy commented 3 years ago

Issue was brought up earlier in 2016 (https://github.com/ros-simulation/gazebo_ros_pkgs/issues/489). A potential solution was provided as a workaround in the comments, but I feel this can be fixed in the gazebo_ros plugin directly. Also made the feature request a bit more comprehensive with this ticket.

alikureishy commented 3 years ago

If this is an acceptable feature, checking it into ROS1 might allow it to be picked up automatically during any pending ROS2 port of gazebo_ros_api_plugin.hpp/cpp as well.

alikureishy commented 3 years ago

For anyone who would like to see this feature, please upvote on the original description above, or add your thoughts via a comment.

alikureishy commented 3 years ago

Implemented a workaround PR to unblock myself. Posting here in case others find it helpful too.

Kettenhoax commented 3 years ago

I have a similar issue, where I try to compare the simulated pose and velocity of a vehicle with recorded poses and velocities. The original vehicle commands (for instance Twist messages) are in the same recording, and I could align the response by timestamp if Gazebo would operate in the same time frame as the recorded ROS(2) bag. My primary goal is to verify the simulation accuracy.

In contrast to the suggested implementation, the most convenient way for me to accomplish this would be gazebo_ros to subscribe to the /clock topic published by ros2 bag play, and to step the simulation accordingly. I realize that the resolution of the published clock increments would have to be forced to a multiple of Gazebo's max_step_size.

Even though this approach feels out of place in this issue with its current wording, in essence my use case still requires some kind of external simulation stepping. I just wanted to add additional motivation to this issue.

alikureishy commented 3 years ago

Thanks for the comment, @Kettenhoax! Your suggestion is actually very relevant to this issue. In fact, I second it, and will shortly reword this issue's content to reflect that. A feature for gazebo to respect (subscribe to) the /clock topic, controlled via a new gazebo-specific param (such as "subscribe_to_clock" or "time_slave"), would work for any situation requiring external control of the system clock (including my use case). The one complication, as you mentioned, is with the appropriate behavior for gazebo, when /clock messages fall within the max_step_size boundary. Also, it's possible this might require a supporting change in the Gazebo-sim repository as well. If I have some free time, I'll try to investigate this approach further, unless we get some alternate suggestions or solutions before that from the community.

Meanwhile, absent the ideal solution, the PR that I submitted could potentially work (with a small hack) for your use-case as well, if your analysis is not sensitive to the max_step_size boundary. If you are blocked on this, let me know and I will elaborate.

alikureishy commented 3 years ago

Potential approaches: 1) A 'step control' API on gazebo_ros that can be invoked directly by ROS nodes (or scripts/actions/services). An ML agent (running as a ROS node, for example) will then be able to run training on cleanly synchronized and quantized action-state-reward tuples generated by the system across each 'step'. 2) Alternatively, allow for ROS/Gazebo to together follow '/clock'. For example, a "use_external_clock" parameter that makes gazebosim (perhaps via gazebo_ros as a conduit) follow the /clock (or other) topic instead of generating its own ticks. This probably requires more breaking changes (across gazebo_ros and gazebosim, and potentially also ROS) compared to # 1. But as per @Kettenhoax 's note above, could be a more universal solution.

Any other options?

soldierofhell commented 3 years ago

Hi @safdark thanks for your interest in this topic and suggested PR. I wonder if you have tested your service in practice, I mean in actual gym-like env implementation. If so, was it only ROS1 or ROS2 also? In particular I wonder if processing ROS callbacks (to get state from env) work as expected, because for example in ROS1 services are synchronous, so during your step service we can't spin() callbacks and after your service the /clock is stopped, right? Can we then process callbacks? I know I can check it but maybe you've done it already :)

alikureishy commented 3 years ago

Thanks for the note, and the question, @soldierofhell !

To answer your second question about the step_control PR : Assuming you are using use_sim_time=True, the impact on ROS nodes after invoking the step_control() service is the same as what would happen if you invoked the pause_physics() service ... the physics (and therefore /clock) will halt, until an unpause_physics() (or subsequent step_control()) invocation is made. That is, after all, the desired behavior in both cases. So, step_control() does not introduce any new limitation while developing ROS nodes.

Just as with invoking the pause_physics() service, the caller must be careful not to create a deadlock or circular dependency by depending on ROSTime between those service calls. As a rule of thumb, the script or node that controls gazebo physics (either via pause_physics or step_control), should only depend on real time during that control period, and not on anything tied to ROSTime (which includes topic callbacks), to avoid such deadlocks. That caveat/consideration makes sense since the caller is effectively taking on the role of an external time coordinator in this situation.

To answer your first question -- I've been using this for my own work for more than a year. I've only been using ROS1 for now, and have not submitted a PR for this on ROS2, but I expect the behavior (and caveat) would be the same on ROS2 as well. I'd be happy to port it to ROS2 if there was more interest from the community -- so far the PR hasn't received much feedback.

Hope I've answered your questions clearly.

alikureishy commented 3 years ago

... Adding to my previous answer, @soldierofhell : If use_sim_time=False (i.e, you were using ROS as the time authority), then there would be no deadlock, since ROS would be the time authority, and gazebo_ros would not publish to /clock at all. And, you'd need to utilize step_control() specifically to keep Gazebo in sync with ROS. Whether that works for you is more a question of whether you want ROS or Gazebo to be the time authority.

goekce commented 2 years ago

For my robotics class I was playing with a RL package using ROS2 and Turtlebot3. Probably because Gazebo time cannot be controlled exactly (in other words does not allow step control), the training node waits for a predefined time like 100 ms at each iteration. If the simulation computer is slow, the training step may take longer than 100 ms and the simulation may advance more than expected. This results in unpredictable results and breaks the training.

My plan was to adapt the code from the Tensorflow example for deep Q learning, then I noticed that Gazebo needs a step simulator command and this is how I arrived here.

I expected more attention to this issue. Is there another way that to do step control?

Thank you @safdark for pushing this issue and implementing a PR. Unfortunately I need a ROS2 solution.

alikureishy commented 2 years ago

Thanks for the comment, @goekce. I too will soon need to port over the solution (or smth similar) for ROS2 in my own setup. I'll probably get some time in late March (or April) to look into this further and will incorporate the use cases and implementation ideas listed by commenters in this thread at that time.

Are you able to make do without this until then?

goekce commented 2 years ago

I will try the following:

Glad for your concern @safdark :slightly_smiling_face:, in worst case I can show other examples to my students.