jc-bao / policy-adaptation-survey

This repository is for comparing the prevailing adaptive control method in both control and learning communities.
Apache License 2.0
7 stars 1 forks source link

Expert policy in constant wind is not as good as expected. #3

Closed jc-bao closed 1 year ago

jc-bao commented 1 year ago

The expert policy's final performance is poor under the disturbance.

curve plot visulization
image image image

The expected error is 0.02m, while the current error is 0.1m. The learned policy sometimes just hovers at a non-zero point.

jc-bao commented 1 year ago
image image
jc-bao commented 1 year ago
image
jc-bao commented 1 year ago
image
jc-bao commented 1 year ago

Possible explanation: entangle of the mass and disturbance.

When the mass is fixed: the controller only needs to learn a constant.

When the mass is variational: the controller needs to adapt to different force directions. Also, the decay force coefficient might also be different.

But still hard to explain why the policy tends to converge to a point close to the origin rather than the true origin point. Maybe the policy has learned a robust control policy rather than an adaptive control policy. This also explains why the expert policy and the vanilla policy have such a small margin. The result is not conflicted with the Drone RMA paper since they do not take time-varying constant force into consideration. Considering the drone's mass and force scale, our result still aligns with their conclusion.

If this is true, our next research question could be: How to embed the system's dynamic information effectively so that the policy will learn to use the superior information rather than just fit into a robust control policy.

jc-bao commented 1 year ago
Curve Policy Plot Policy Visualization
image image pi(x)imagepi(x,e) image

After pending zeros to vanilla policy:

image

The performance is even worse. (Consider it as a variance?)

jc-bao commented 1 year ago

5