jc-bao / policy-adaptation-survey

This repository is for comparing the prevailing adaptive control method in both control and learning communities.

Apache License 2.0

7 stars 1 forks source link

Policy after adaptation performs better than expert #13

Closed jc-bao closed 1 year ago

jc-bao commented 1 year ago

Expert	Before adapt	After adapt
$0.0629 \pm 0.0635$	$0.0482 \pm 0.0400$	$0.0403 \pm 0.0453$

Adapted policy performance is better than expert?

jc-bao commented 1 year ago

Due to the different evaluation methods

Test the training result in vanilla mode.


0.0746	0.0683	0.0737

Conclusion: the evaluation apart from the prediction of e is the same.

jc-bao commented 1 year ago

Sanity check: without compressor

Assumption: the failure is possibly related to the compressor.

0.0852 -> 0.0459 -> 0.0473

The problem still exists.

jc-bao commented 1 year ago

Presudo code

data = collect_data(network=net1)
update_network(data, net1)
evluate_network(net1)

data_2 = collect_data(network=net2)
update net2 with: Norm(net1(data_2)-net2(data_2))
evaluate_network(net2)

jc-bao commented 1 year ago

Sanity check

Without residue dynamics

0.0063 -> 0.0267 -> 0.0081

jc-bao commented 1 year ago

Possible explanation

When some unobservable parameter exists, the expert might not be as good as the adaptor since the adaptor has more flexibility.

Verification: allow the adaptor to update policy during the adaptation stage, will the performance be better?

Update with regularization 0.0772 -> 0.0474 -> 0.0482
Update with objective 0.0772 -> 0.0975 -> 0.0504

jc-bao commented 1 year ago

Problem Addressed

By feeding more data to the simulator, the performance becomes normal:

0.0168 -> 0.0290 -> 0.0171