Closed jc-bao closed 1 year ago
small decay and small mass is still a nightmare to the policy.
From positive decay to negative decay, there is a significant performance drop, which could be the reason why the performance under the damping ratio close to zero value will drop significantly.
Current problem: the learned policy has performance deterioration in margin parameters.
Our research question: overcome the RMA's limitation in the OOD case (model and parameter), which asks us to introduce a better architecture, representation, update method and feedback.
Maybe MAML can help solve the related issue since our current policy is a recurrent version of meta-learning.
Assumption: the problem is caused by overfitting.
trained damping range | [0, 0.05) | [0,0.3) | compare(large-small) |
---|---|---|---|
rewards |
training on small damping rate is much better!
⭐️Overfitting to single mode.
Possible solutions
The expert policy performs well in most cases. However, there are some parameters that the policy just performs poorly, which leads to a performance drop for expert policy.
Sensitivity analysis