google / brax

Massively parallel rigidbody physics simulation on accelerator hardware.
Apache License 2.0
2.14k stars 234 forks source link

Wrong behavior with `generalized` and instability with `positional` and `spring` #383

Closed UltronAI closed 10 months ago

UltronAI commented 11 months ago

Hello there,

I have been recently exploring and experimenting with the Brax Basics Colab, training the built-in APG/PPO algorithm and investigating the impact of different physics backends. In the process, I've encountered some intriguing behavior patterns that I'd like to share and discuss.

  1. Tossing the ball

In this task, the ball is expected to bounce upon after hitting the ground. The default positional backend executed this behavior well and produced satisfying results. However, when I employed the generalized backend, the ball merely rolled on the ground, failing to bounce. This response does not correspond to expected physics-based behavior.

Here's the visualization with positional in Red, generalized in Green, and spring in Blue:

image
  1. Swinging the pendulum

In this task, the generalized backend performed well, providing the expected results. However, the positional backend collapsed for all step sizes, and the spring backend exhibited instability when the step size increased.

image

image

  1. Pointing the pendulum

In this task, only generalized worked well. Both the positional and spring backends failed to maintain stability.

image

image

  1. APG training

During my attempts to train an APG agent using these backends, the results varied significantly.

For the Ant task, which has rich contact interactions, APG struggled with the generalized backend but exhibited some learning capacity with positional and spring.

image

As for the Reacher task, which is a simpler system, APG with generalized outperformed the other two backends.

image

(Update: I find that reacher with generalized may lead to NaN gradients as well)

The parameters I used for these two experiments:

'ant': functools.partial(apg.train, num_evals=500, episode_length=300, normalize_observations=True, 
                                 action_repeat=1, num_envs=256, num_eval_envs=128, learning_rate=3e-3,
                                 truncation_length=10, seed=args.seed)
'reacher': functools.partial(apg.train, num_evals=200, episode_length=300, normalize_observations=True, 
                                     action_repeat=4, num_envs=128, num_eval_envs=8, learning_rate=3e-3,
                                     truncation_length=10, seed=args.seed)

Interestingly, these results seem to suggest that the generalized backend excels when handling contacts but struggles to compute their gradients (or the analytic gradients are useless for training), with the exception of the ball-tossing experiment. On the other hand, the positional and spring backends handle more complex systems efficiently but fail when dealing with simpler systems like the pendulum.

(Update: after more experiments with positional and spring, I find that they are unsurprisingly not as good as generalized in some environments and can lead to different learned policies. For instance, in walker2d, PPO with generalized learns to walk with two legs, PPO with positional learns to walk with only one leg, and PPO with spring fails to walk, all using the same training parameters.)

Moreover, I recently came across a paper that delves into the topic of how varying contact models can produce different outcomes. The authors of this paper conducted a series of experiments utilizing Brax v1, specifically employing the positional and legacy_spring backends.

As per their findings, the discrepancies between the results could be attributed to the different contact models used. This further intrigues me to understand the mechanics of the generalized backend, particularly how it models the contacts.

To enhance the comprehension of these behaviors and the potential reasons behind the aforementioned inconsistencies, I would greatly appreciate a more in-depth explanation or any resources that could illuminate the underlying workings of the generalized backend in contact modeling.

I look forward to your insightful responses and thanks for your great work on Brax!

amine789 commented 11 months ago

Hi i am trying to learn the elasticity parameters in the positional based dynamics by generating two scenarios, one simulation scenario with the real elasticity parameters and the other one with random elasticity, then compute the loss wrt positions but i am getting elasticity zero elasticity gradient wrt to position, did you get something similar?

UltronAI commented 11 months ago

Hi, @amine789 While I didn't directly experiment with elasticity parameters, I found some relevant insights in this paper. See the data presented in the final row of Table 1. The authors provided a reasonable explanation in the corresponding section. Hope it's helpful for you!

btaba commented 10 months ago

Hi @UltronAI , thanks for the insightful experiments using brax!

[1] Tossing the ball

the restitution params between spring/positional and generalized are quite different. For spring/positional, they are controlled by the elasticity param (https://github.com/google/brax/blob/main/brax/io/mjcf.py#L182) and for generalized we use the contact model from MuJoCo. The restitution for generalized is controlled by the solver params, see https://mujoco.readthedocs.io/en/latest/modeling.html#restitution

Other params that may affect restitution in positional are the physics timestep and collide_scale, amongst others.

[2] Swinging the pendulum

I would recommend tuning parameters to maintain stability in positionl//spring. Joint constraints are implicitly maintained in generalized since it uses Featherstone's. For positional/spring, joint constraints are resolved at every time step and are likely to be more unstable, and need to be tuned (ang/linear damping etc.).

It isn't clear if you are training a policy for swinging, and are making stability conclusions based on the final policy or the physics?

[4] training

Thanks for the findings! The contact impulse are backed out through the constraint solver in generalized, and it is thus not too surprising that the gradients may be less useful (autograd through a constraint solve) compared to the simple impulse-based contact updates in spring/positional.

For further reference on the contact model for generalized, see https://mujoco.readthedocs.io/en/latest/computation.html#contact