jc-bao / policy-adaptation-survey

This repository is for comparing the prevailing adaptive control method in both control and learning communities.
Apache License 2.0
7 stars 1 forks source link

Project vector & velocity record. #12

Open jc-bao opened 1 year ago

jc-bao commented 1 year ago

This issue is used to keep a record of the research emphasis and engineering effort of each research period.

jc-bao commented 1 year ago

:one: Identify research questions

possible issues:

jc-bao commented 1 year ago

Meeting@2023.1.17

jc-bao commented 1 year ago

Meeting@2023.1.28

image

TODO

jc-bao commented 1 year ago

:no_entry: Vector: expert performance under residue dynamics

Research question: how to enable a model-free method to achieve better performance given complex dynamics?

Expected result: better tracking performance in residue dynamics.

❌ Velocity: Curriculum

:warning: Deprecated due to the case might not match the real-world case. (too much human-engineering dynamics. )

:eye_speech_bubble: Hindsight: wrong trail. Our research question is to achieve high-performance adaptive control, not better model-free performance.

jc-bao commented 1 year ago

:stop_button: Vector: adaptor performance gave imperfect model

Question: how to enable adaptor to generalize to unseen scenarios?

❌ Velocity: Soft update adaptor

Origin (L2 loss to z)

Train(policy=expert, ) Adapt(policy=adaptor, ) Perfect model

Method Expert Adapt begin Adapt end
Baseline 0.0444 0.0518 0.0395
ARC0 0.0444 1.3167 0.3180
ARC1 0.0444 0.7872 0.1423
ARC10 0.0444 0.0770 0.0363
ARC50 0.0444 0.0572 0.0384
TUC-1 0.1663 0.1810 0.1635
TUC-3 0.1229 0.0784 0.0961

Imperfect model

Method Expert Adapt begin Adapt end
Baseline 0.5054 0.1674 0.2007
Baseline-Adapt 0.0379 0.0435 0.0416
TUC-3 0.5338 0.2005 0.2476
TUC-3-Adapt 0.0453 0.0514 0.0470

:eye_speech_bubble: Hindsight: wrong trail. Dive into detail before identifying the true research question.

▶️ Vector: identify the adaptor module problem

Question: under what circumstances can we observe a significant performance drop for the RMA algorithm?

🔧 Velocity1: Unobservable parameters.

Polynomial residue dynamics

$f(v,w) = x^T M x + C$, where $x=[v, w], v \in R^3, w \sim \mathcal{U}(-1,1) \in R^{d_w} $ $C \sim \mathcal{U}(-1,1) \in R^3, M \sim \mathcal{U}(-1,1) \in R^{3 \times 4 \times 4}$

Last 10 steps average tracking error. $d_w=2$ Expert $d_w=2$ RMA before adaptation $d_w=2$ RMA after adaptation Vanilla(Robust)
0.027 0.065 0.029 0.147
$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
0.063 0.101 0.077
$d_w=1$ C-4 Expert $d_w=1$ C-4 RMA before adaptation $d_w=1$ C-4 RMA after adaptation
0.067 0.097 0.074
$d_w=0$ Expert $d_w=0$ RMA before adaptation $d_w=0$ RMA after adaptation
0.008 0.028 0.007

*C-4: use MLP to compress all parameters to a 4-dimensional embedding.

MLP $f(v, w)$ residue dynamics

[128, 128] Mlp initialized with nn.init.orthogonal_(m.weight, gain=1), nn.init.uniform_(m.bias, -0.2, 0.2).

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation Vanilla(Robust)
0.0141 0.0294 0.0161 0.0354
$d_w=1$ C-4 Expert $d_w=1$ C-4 RMA before adaptation $d_w=1$ C-4 RMA after adaptation ppo_mlp_expert_Dw1_plot
0.0169 0.0183 0.0164 ppo_mlp_expert_Dw1_vis

MLP $f(v, u,w)$ residue dynamics

low sensitivity to u

$d_w=0$ Expert $d_w=0$ RMA before adaptation $d_w=0$ RMA after adaptation Vanilla(Robust)
0.0189 0.0215 0.0183 0.0640
$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
0.0386 0.0422 0.0274
$d_w=1$ C-4 Expert $d_w=1$ C-4 RMA before adaptation $d_w=1$ C-4 RMA after adaptation ppo_mlp_expert_Dw1_plot
0.0103 0.0120 0.0092 ppo_mlp_expert_Dw1_vis

high sensitivity to u (by using non-zero bias mlp. )

$d_w=0$ Expert $d_w=0$ RMA before adaptation $d_w=0$ RMA after adaptation Vanilla(Robust)
0.0287 0.0340 0.0296 0.0586
$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
0.0687 0.0449 0.0331
$d_w=1$ C-4 Expert $d_w=1$ C-4 RMA before adaptation $d_w=1$ C-4 RMA after adaptation ppo_mlp_expert_Dw1_plot
0.0367 0.0331 0.0269 ppo_mlp_expert_Dw1_vis

high sensitivity + force scale *2

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation Vanilla
0.0687 0.0449 0.0331
$d_w=1$ C-4 Expert $d_w=1$ C-4 RMA before adaptation $d_w=1$ C-4 RMA after adaptation ppo_mlp_expert_Dw1_plot
0.0367 0.0331 0.0269 ppo_mlp_expert_Dw1_vis

🔧 Velocity2: Parameter OOD Cases.

Extrapolation OOD case

OOD Expert OOD RMA before adaptation OOD RMA after adaptation Plot
0.2074 0.4166 0.2110 ppo_mlp_expert_Dw1_plot
w/o OOD Expert w/o OOD RMA before adaptation w/o OOD RMA after adaptation Plot
0.0687 0.0449 0.0331 ppo_mlp_expert_Dw1_plot
C-4 OOD Expert C-4 OOD RMA before adaptation C-4 OOD RMA after adaptation Plot
0.3596 0.3225 0.3516
C-4 w/o OOD Expert C-4 w/o OOD RMA before adaptation C-4 w/o OOD RMA after adaptation Plot
0.0629 0.0482 0.0403
w/o OOD OOD
mass-decay image-20230114213322197 image-20230114213715284
decay-param image-20230114213417826 image-20230114213656702
param-mass image-20230114213450407 image-20230114214003326

Updated results

Policy Expert Before adaptation After adaptation
Baseline 0.2287 nan nan
Baseline-OOD(full dynamic param) 0.2567 nan nan
Baseline-OOD 0.3688 nan nan
RMA 0.0945 0.1784 0.1455
RMA-OOD(full dynamic param) 0.1448 0.2841 0.2186
RMA-OOD 0.2119 0.3287 0.2714

Intropolation OOD case

Policy Expert Before adaptation After adaptation
No OOD 0.1183 0.2247 0.1317
intra OOD full 0.1883 0.3352 0.2461
extra OOD full 0.2446 0.4018 0.3160
only intra res dyn 0.1008 0.2032 0.1161
only extra res dyn 0.1134 0.2205 0.1313

Visualize environment encoder mapping.

50% 70% 100%
image-20230201223348617 image-20230201222815306 image-20230201222847866
Center Left Out Full
image-20230202085056463 image-20230202084827287 image-20230202084230655 image-20230202085024035

Test case: compress 2d/3d/4d residue dynamic parameters with environment encoder to a 2d embedding.

2d 3d 4d
image-20230202113513153 dim3=0 image-20230202113616929 dim3,4=[0,0] image-20230202113846326
dim3=0.5image-20230202113707042 dim3,4=[1,0]image-20230202113911613
dim3=1.0image-20230202113730947 dim3,4=[1,1]image-20230202114036406

Conclusion: In OOD parameters, the encoder space is still continuous.

2 residue dynamic parameters

none OOD OOD OOD
image-20230202132202434 image-20230202132729289 image-20230202132253316

No compressor visualization.

3 disturbance values

visualize disturbance mapping

image-20230208161313027

mass disturb decay
![image-20230209191434616](/Users/reedpan/Library/Application Support/typora-user-images/image-20230209191434616.png) ![image-20230209191535401](/Users/reedpan/Library/Application Support/typora-user-images/image-20230209191535401.png) image-20230209191605429
resdyn & force scale ⚠️force scale ⚠️resdyn
image-20230209191700594 image-20230209191921926 image-20230209192134061
mass disturb decay
image-20230209192312951 image-20230209192406597 ![image-20230209192350901](/Users/reedpan/Library/Application Support/typora-user-images/image-20230209192350901.png)
resdyn & force scale force scale resdyn
image-20230209192259880 image-20230209192609950 image-20230209192701873
None OOD mass_max OOD mass_max None OOD disturb max OOD disturb max
image-20230209194717184 image-20230209194949203 image-20230209195417238 image-20230209195208873
None OOD decay_max OOD decay_max none OOD res param OOD res param
image-20230209195613877 image-20230209195743420 image-20230209200158135 image-20230209200018113
none OOD force scale OOD force scale noneOOD all max OOD all max
image-20230209200355920 image-20230209200537736 image-20230209200957773 image-20230209200807767

image-20230208163600740

image-20230208171803627 image-20230208171816339 image-20230208171854275

check with other kind of parameters.

1 2 3 4 5
![image-20230204213708462](/Users/reedpan/Library/Application Support/typora-user-images/image-20230204213708462.png) ![image-20230204213758802](/Users/reedpan/Library/Application Support/typora-user-images/image-20230204213758802.png) image-20230204213928027 image-20230204214019905 image-20230204214145248

PCA analysis with higher dimensional parameters

image-20230204215957941

OOD performance evaluation

Performance in OOD case

image-20230209215050383
OOD image-20230211091321301
none OOD image-20230211092143935
Training set left boundary(OOD) center parameter right boundary(OOD)
100% image-20230211092846137 image-20230210163333948 image-20230211092937932
50% image-20230211092727242 image-20230210164433384 image-20230211092553000

🔧 Velocity3: Model Mismatch Cases.

Training without Residue model.

$f(v,u,w)$

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation Vanilla(Robust)
0.7379 0.8989 0.6864 0.4900
$d_w=1$ C-4 Expert $d_w=1$ C-4 RMA before adaptation $d_w=1$ C-4 RMA after adaptation ppo_mlp_expert_Dw1_plot
0.6545 0.5992 0.5941 ppo_mlp_expert_Dw1_vis

$f(v,u)$

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation Vanilla(Robust)
0.0685 0.0808 0.1392 0.0872
$d_w=1$ C-4 Expert $d_w=1$ C-4 RMA before adaptation $d_w=1$ C-4 RMA after adaptation
0.0659 0.0900 0.1254 ppo_mlp_expert_Dw1_plot

Training with a simplified model

Force Scale =[3, 3, 3]

Training with fitted model (32 trajectory, mean error=0.29) Fail to stablize.

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
0.4646 0.3908 0.2465

Training with fitted model (128 trajectory, mean error=0.07)

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
0.6057 0.5747 0.5860

Training with fitted model (512 trajectory, mean error=0.01)

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
0.1654 0.5242 0.3003

Training with true model

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
0.0718 0.0588 0.0355

Force Scale =[2, 4, 2]

Training with fitted model (32 trajectory, mean error=0.327) Fail to stablize.

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
1.4253 0.5395 0.7555

Training with fitted model (64 trajectory, mean error=0.102)

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
0.1978 0.1346 0.1117

Training with fitted model (128 trajectory, mean error=0.033)

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
0.3220 0.2422 0.1854

Training with true model

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
0.0781 0.0703 0.0447

Training with dropout model

$d_w=1$ Expert $d_w=1$ RMA before adaptation $d_w=1$ RMA after adaptation
0.1910 0.1951 0.1660

TODO:ballot_box: tries simplified wind v.s. Real wind.

jc-bao commented 1 year ago

▶️ Vector: study the sim2real setting

:wrench: Pybullet-based lab environment setup

:wrench: Crazyswarm setup

jc-bao commented 1 year ago

🚗 Progress @2023.5.18

🗺️ Big Picture

✈️ Tasks

🥅 Next step

🧷 Check list