rickstaa commented 4 years ago

To make the code as close as possible to PyTorch we need to translate it to tf2 eager code.

rickstaa commented 4 years ago

Comparison

Currently the results differ between the tf2 and tf2-eager code. I need to check the following things:

Checks

[x] ~~Lyapunov multiplier target networks are frozen in both version~~
[x] Gaussian actor-network and Gaussian actor target network are initialized with the same weights.
[x] Gaussian actor target network follows the Gaussian actor-network using Exponential Moving Average.
[x] Lyapunov critic network and Lyapunov critic target network are initialized with the same weights.
[x] Lyapunov critic target network follows the Lyapiunov critic-network using Exponential Moving Average.

rickstaa commented 4 years ago

Why doing this I found one bug :bug:.

Weights should not be frozen for the second Lyapunov target network

First I noticed that the second Gaussian Actor and Lyapunov Critic do not exist in the original version! It was introduced due to my misunderstanding of the reuse and trainable and custom_getter methods of TensorFlow. Due to the nature of reuse argument, the following lines do not create a second Gaussian Actor and Lyapunov Critic but merely REUSE the already created Gaussian actor:

lya_a_, _, _ = self._build_a(self.S_, reuse=True)
self.l_ = self._build_l(self.S_, lya_a_, reuse=True

The only difference with the lines that create the original Gaussian Actor and Lyapunov Critic is that we now input the next state instead of the current state:

self.a, self.deterministic_a, self.a_dist = self._build_a(self.S)
self.l = self._build_l(self.S, self.a_input)

The reason we do this instead of using the Target Gaussian Actor and Lyapunov Critic is that we don't want to use the Exponential Moving Average when we are optimizing lambda and the Actor. This makes total sense when comparing this with the formulas used in Han et al 2019:

rickstaa commented 4 years ago

LAC output layer difference with SAC

I also found an important distinction between the LAC TensorFlow-eager implementation and the Pytorch LAC implementation. The code that creates the Gaussian actor and Lyapunov critic, that is used in the LAC PyTorch implementation is based on the spinning up SAC implementation. In this implementation when creating the networks, it is assumed that the output layer is a fully-connected layer with a relu activation function. In our LAC algorithm, it has to be a fully connected layer with a square activation function. We, therefore, have to make sure. We, therefore, have to make sure that the output layer size is not passed to the mlp function as this is already handled in the forward function of the network. If this is not done the network has an extra fully connected layer. I already fixed this early on in the machine_learning_control/LAC implementation, but when I translated the code again here, I thought it is wise to write it down for future reference.

rickstaa commented 4 years ago

Completed in 7382534.

rickstaa / LAC-TF2-TORCH-translation

Convert TF2 code to TF2 eager code #7

Comparison

Checks

Weights should not be frozen for the second Lyapunov target network

LAC output layer difference with SAC