rickstaa / stable-learning-control

A framework for training theoretically stable (and robust) Reinforcement Learning control algorithms.
https://rickstaa.dev/stable-learning-control
MIT License
6 stars 1 forks source link

Test new DLAC and LSAC architectures #36

Closed rickstaa closed 3 years ago

rickstaa commented 4 years ago

In this issue, the results of two new architectures DLAC and LSAC are compared with the original LAC algorithm. To do this I will use the oscillator environment. I will also set the Environment and Algorithm seeds to 0.

LAC results

The original LAC algorithm gives the following results:

Good policy image

Bad policy image

Conculsion

DLAC results

In the double-Lypaunov Actor-Critic, two Lyapunov critics are used instead of one. Following the maximum L, value is used for calculating the actor loss. This is similar to the double-Q trick that is used in the original SAC algorithm.

Results

In the current from the double-Lyapunov Soft actor, the critic is not able to train. I, however, think this is due to an error in the implementation. I will postpone researching this architecture after the Pytorch version is fully ready as in there it is easier to debug.

LSAC

In the Lyapunov Soft Actor-Critic (Couldn't think of a name) contains both a Lyapunov critic and a normal soft critic. Following the results of both these critics are combined in the loss function for the policy:

image

Results

rickstaa commented 4 years ago

LSAC automatic temperature tuning

Now let's add an additional Lagrance multiplier for the contribution of the Q networks.

Sigma direction investigation

When I implemented the temperature variable (sigma) for the value component of the actor loss function I noticed this Lagrange multiplier (sigma) sometimes increase and sometimes decreases.

image

Following lets minimize the following equality constraint:

image

Results

Sigma increases

image

image

Sigma decreases

image

Good policy image image image image

Bad policy image image image

rickstaa commented 3 years ago

Closed as this is not on the immediate roadmap.