realquantumcookie / APRL

Efficient Real-World RL for Legged Locomotion via Adaptive Policy Regularization
MIT License
63 stars 5 forks source link

How to reproduce 0.44m/s "Restricted"? #1

Open YouJiacheng opened 11 months ago

YouJiacheng commented 11 months ago

Dear author: Thanks a lot for inventing APRL and open sourcing an official implementation.

I have a question about the performance of "Restricted". Fig 3 in [46]Demonstrating a walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning showed that robot can move on the flat ground at an average speed of 0.06m/s after 20min. Fig 6. in the APRL paper showed that robot can move on the flat ground at an average speed of 0.44m/s after 20min.

There is a 7x difference. I have noticed that the APRL paper used Go1 while [46] used A1, and different velocity measurement might be applied (tracking camera vs. Kalman filter). I want to know if there is any other difference between "Restricted" and [46].

Thanks!

realquantumcookie commented 11 months ago

Hi there, Thank you for your question. The difference between "Restricted" and [46] are:

  1. The restricted method share the same action space and similar observation space as [46], but have a bit different reward shaping. The only difference in the observation space is that we used normalized foot contact forces (in the restricted method) instead of binary foot contact observations used in [46] due to our Go1 robot foot contact sensor being not very reliable...
  2. The velocity measurement for the restricted method comes from a tracking camera while the velocity measurement for [46] comes from a Kalman filter combining information from (1) forward kinematics (2) onboard accelerometer. This measurement is used in both the observation and the reward function.
  3. Please look at our project website for the reward function. The main changes affecting the learning speed is that we scaled up the reward for velocity and used a near quadratic term for the velocity reward. This kind of reward shaping makes the algorithm pick up reward signals earlier in training and thus makes the training faster.
YouJiacheng commented 11 months ago

Thank you for your comprehensive and in-depth explanation!

realquantumcookie commented 11 months ago

Hi @YouJiacheng I will leave this issue open in case other people have similar questions as yours.