jc-bao / policy-adaptation-survey

This repository is for comparing the prevailing adaptive control method in both control and learning communities.
Apache License 2.0
7 stars 1 forks source link

Task issue: margin between different methods is small. #11

Closed jc-bao closed 1 year ago

jc-bao commented 1 year ago

Experiment settings

        self.mass_min, self.mass_max = 0.006, 0.03
        self.delay_min, self.delay_max = 0, 0
        self.decay_min, self.decay_max = 0.0, 0.1
        self.res_dyn_param_min, self.res_dyn_param_max = -1.0, 1.0
        self.disturb_min, self.disturb_max = -0.8, 0.8
        self.action_noise_std, self.obs_noise_std = 0.00, 0.00

Current result

Last 10 steps average tracking error. Expert RMA before adaptation RMA after adaptation Vanilla(Robust)
0.0116 0.0126 0.0121 0.0145
C-4 Expert C-4 RMA before adaptation C-4 RMA after adaptation
0.0115 0.0157 0.0118

*C-4: use MLP to compress all parameters to a 4-dimensional embedding.

Possible solutions

jc-bao commented 1 year ago

Trail1: add residue dynamics.

$f(v,u,w)$

$\pi(x,e)$ error: 0.264

$f(v,w)$

MLP: $\pi(x,e)$ error: 0.217 (variance is large. ) $\pi(x)$ error: 0.123 Polynomial: ✔️ $\pi(x,e)$ error: 0.053 $\pi(x)$ error: 0.16

Experinment with Polynomial residue dynamic

Last 10 steps average tracking error. Expert RMA before adaptation RMA after adaptation Vanilla(Robust)
0.053 0.083 0.058 0.147
C-4 Expert C-4 RMA before adaptation C-4 RMA after adaptation
0.067 0.097 0.074
$f(v)$ Expert $f(v)$ RMA before adaptation $f(v)$ RMA after adaptation
0.008 0.028 0.007