jc-bao / policy-adaptation-survey

This repository is for comparing the prevailing adaptive control method in both control and learning communities.
Apache License 2.0
7 stars 1 forks source link

Expert performance decay in extreme cases #8

Closed jc-bao closed 1 year ago

jc-bao commented 1 year ago

The expert policy performs well in most cases. However, there are some parameters that the policy just performs poorly, which leads to a performance drop for expert policy.

jc-bao commented 1 year ago

trail1: just train on a wider parameter range, then evaluate on a smaller one.

image
jc-bao commented 1 year ago
image

small decay and small mass is still a nightmare to the policy.

jc-bao commented 1 year ago
image
jc-bao commented 1 year ago

From positive decay to negative decay, there is a significant performance drop, which could be the reason why the performance under the damping ratio close to zero value will drop significantly.

image
jc-bao commented 1 year ago

Dose this related to our research question?

Current problem: the learned policy has performance deterioration in margin parameters.

Our research question: overcome the RMA's limitation in the OOD case (model and parameter), which asks us to introduce a better architecture, representation, update method and feedback.

Can meta-learning solve this problem?

Maybe MAML can help solve the related issue since our current policy is a recurrent version of meta-learning.

Assumption: the problem is caused by overfitting.

jc-bao commented 1 year ago

Comparision: only training on small parameters v.s. training on all possible values

trained damping range [0, 0.05) [0,0.3) compare(large-small)
rewards image image image

training on small damping rate is much better!

jc-bao commented 1 year ago

Problem identified

⭐️Overfitting to single mode.

Possible solutions