Closed jc-bao closed 1 year ago
Test the training result in vanilla mode.
0.0746 | 0.0683 | 0.0737 |
Assumption: the failure is possibly related to the compressor.
0.0852 -> 0.0459 -> 0.0473
The problem still exists.
Presudo code
data = collect_data(network=net1)
update_network(data, net1)
evluate_network(net1)
data_2 = collect_data(network=net2)
update net2 with: Norm(net1(data_2)-net2(data_2))
evaluate_network(net2)
Without residue dynamics
0.0063 -> 0.0267 -> 0.0081
When some unobservable parameter exists, the expert might not be as good as the adaptor since the adaptor has more flexibility.
By feeding more data to the simulator, the performance becomes normal:
0.0168 -> 0.0290 -> 0.0171
Adapted policy performance is better than expert?