jc-bao / policy-adaptation-survey

This repository is for comparing the prevailing adaptive control method in both control and learning communities.
Apache License 2.0
7 stars 1 forks source link

Policy after adaptation performs better than expert #13

Closed jc-bao closed 1 year ago

jc-bao commented 1 year ago
Expert Before adapt After adapt
$0.0629 \pm 0.0635$ $0.0482 \pm 0.0400$ $0.0403 \pm 0.0453$

Adapted policy performance is better than expert?

jc-bao commented 1 year ago

Due to the different evaluation methods

Test the training result in vanilla mode.

0.0746 0.0683 0.0737
jc-bao commented 1 year ago

Sanity check: without compressor

Assumption: the failure is possibly related to the compressor.

0.0852 -> 0.0459 -> 0.0473

The problem still exists.

jc-bao commented 1 year ago

Presudo code

data = collect_data(network=net1)
update_network(data, net1)
evluate_network(net1)

data_2 = collect_data(network=net2)
update net2 with: Norm(net1(data_2)-net2(data_2))
evaluate_network(net2)
jc-bao commented 1 year ago

Sanity check

Without residue dynamics

0.0063 -> 0.0267 -> 0.0081

jc-bao commented 1 year ago

Possible explanation

When some unobservable parameter exists, the expert might not be as good as the adaptor since the adaptor has more flexibility.

Verification: allow the adaptor to update policy during the adaptation stage, will the performance be better?

jc-bao commented 1 year ago

Problem Addressed

By feeding more data to the simulator, the performance becomes normal:

0.0168 -> 0.0290 -> 0.0171