jc-bao / policy-adaptation-survey

This repository is for comparing the prevailing adaptive control method in both control and learning communities.

Apache License 2.0

7 stars 1 forks source link

Expert performance decay in extreme cases #8

Closed jc-bao closed 1 year ago

jc-bao commented 1 year ago

The expert policy performs well in most cases. However, there are some parameters that the policy just performs poorly, which leads to a performance drop for expert policy.

Current expert policy error: 0.03 m
Sensitivity analysis

The expert policy cannot handle the small decay and small mass very well.

jc-bao commented 1 year ago

trail1: just train on a wider parameter range, then evaluate on a smaller one.

mass range: [0.004, 0.4)
decay range: [-0.1, 0.3)

The policy is getting much better! Try to extend the training parameters more.

jc-bao commented 1 year ago

mass range: [0.002, 0.3)
decay range: [-0.2, 0.3)

small decay and small mass is still a nightmare to the policy.

jc-bao commented 1 year ago

mass range: [0.005, 0.3)
decay range: [-0.3, 0.3)

jc-bao commented 1 year ago

From positive decay to negative decay, there is a significant performance drop, which could be the reason why the performance under the damping ratio close to zero value will drop significantly.

jc-bao commented 1 year ago

Dose this related to our research question?

Current problem: the learned policy has performance deterioration in margin parameters.

Our research question: overcome the RMA's limitation in the OOD case (model and parameter), which asks us to introduce a better architecture, representation, update method and feedback.

Can meta-learning solve this problem?

Maybe MAML can help solve the related issue since our current policy is a recurrent version of meta-learning.

Assumption: the problem is caused by overfitting.

jc-bao commented 1 year ago

Comparision: only training on small parameters v.s. training on all possible values

trained damping range	[0, 0.05)	[0,0.3)	compare(large-small)
rewards

training on small damping rate is much better!

jc-bao commented 1 year ago

Problem identified

⭐️Overfitting to single mode.

Possible solutions

better curriculum