Legged robots stand still despite the rewards of tracking linear and angular velocities

RainJCloude commented 3 months ago

Hello, I am trying to use Isaac Orbit to train solo12. I followed the guide to generate the .usd, and the robot spawns in the environment. I am also using the same reward functions of ANYmal (of course changing the termination and every thing related to the name of the links), and the same observation and action space. However, solo12 prefer to stand still after falling several time instead of moving.

I really have no idea that why this is happening. It doesn't make sense that the robot that receive reward function for the tracking of linear and angular velocity, remains still.

Again, the rl_task_env_cfg is basically the same as the repository, except the termination condition. Also the rest of the training algorithm is the same. The only thing that I've generated is the asset solo12.py in which i gave the gains for the PD controller and the initial configuration.

Thanks in advance

Mayankm96 commented 3 months ago

Are you using the rough terrain, or is this on the flat terrain?

The robot may learn to stay still if the penalty for moving is too high compared to the reward for the task. It is hard to provide feedback on that without more information.

RainJCloude commented 3 months ago

Flat terrain. I know that it's hard try to help me with these few information, but I am using the whole orbit code without any kind of modification, just with another robot. The fact that it is standing still could be due to wrong setting of PD gains? A wrong initial configuration? A wrong conversion of the URDF to USD? Have anyone ever had a similar issue?

dhananjayrajmane commented 2 months ago

@RainJCloude Can you check enabled_self_collisions variable ? Set to false and check. Similar kind of issue faced previously, after setting enabled_self_collisions variable to False training worked perfect

RainJCloude commented 2 months ago

It is set to False. I've basically copied and pasted all the code but just including my Usd. I also think that my usd is correct because I am able to give open loop command to the joints through the Omnigraph controller.

RainJCloude commented 2 months ago

Okay maybe I can explain in a better way my problem: Screenshot from 2024-06-10 17-27-34

The quadruped is not able to train because the value loss starts from a gigantic value, and the robot never learns to walk within its 900 training episodes.

About the configuration of the environment I've set reward, and observation copying that from UnitreeGo1.

About the asset, I've used this values for the actuator, maybe the problem could be related to these:

ACTUATOR_LEGS_ALL = IdealPDActuatorCfg(
    joint_names_expr=[".*_HAA", ".*_HFE", ".*_KFE"],
    effort_limit=5.0,
    velocity_limit=1.0,
    stiffness={".*": 5},
    damping={".*": 0.05}
)

KyleM73 commented 2 months ago

I know this issue is a little old but a few suggestions:

let it keep training. even for standard robots like Go1 ive seen it learn to stand still and it wasn't until iteration ~2000 that it started learning to walk
turn down the weight of l2 acceleration, it should never be that large
record videos during training so you can see what behaviors the robot is learning
increase your PD actuator gains depending on the size of the robot. for reference Spot (35 kg) is P=60 D=1.5, Go1 (13 kg) is P=20 D=0.5
make the standard deviation param for velocity tracking smaller. I find that the defaults are too permissive and rewards the robot even for somewhat large velocity errors. Making std smaller forces the policy to minimize the error more aggressively. for example see this comparison: https://www.desmos.com/calculator/hchecap9yl
you might also experiment with a normed exponential kernel instead of a square exponential, as the norm exp(-||x||/std^2) is more aggressive (but has some artifacting for very small errors about zero)

good luck!!

RainJCloude commented 2 months ago

Thank you so much! I solved last week and the problem was actually the acceleration. Moreover, also solo12 also need 800 episode to be trained. Do you know why it requires so much time whereas ANYmal learns to walk in just 300 episodes?

thanks again a lot. Your suggestions have been really appreciated

KyleM73 commented 2 months ago

Great! As for the Anymal and Go1 training speeds, I think they're influenced by two factors: 1) Both robots have actuator nets trained for them. This helps bias the policy towards walking very quickly. 2) The rewards were tuned for these robots specifically- with sufficient reward tuning other robots should be able to approach similar training speeds, but it won't happen out of the box with rewards designed for a different robot (even though very similar)

isaac-sim / IsaacLab

Legged robots stand still despite the rewards of tracking linear and angular velocities #458