Closed rickstaa closed 4 years ago
Currently the results differ between the tf2 and tf2-eager code. I need to check the following things:
Why doing this I found one bug :bug:.
First I noticed that the second Gaussian Actor and Lyapunov Critic do not exist in the original version! It was introduced due to my misunderstanding of the reuse and trainable and custom_getter methods of TensorFlow. Due to the nature of reuse
argument, the following lines do not create a second Gaussian Actor and Lyapunov Critic but merely REUSE the already created Gaussian actor:
lya_a_, _, _ = self._build_a(self.S_, reuse=True)
self.l_ = self._build_l(self.S_, lya_a_, reuse=True
The only difference with the lines that create the original Gaussian Actor and Lyapunov Critic is that we now input the next state instead of the current state:
self.a, self.deterministic_a, self.a_dist = self._build_a(self.S)
self.l = self._build_l(self.S, self.a_input)
The reason we do this instead of using the Target Gaussian Actor and Lyapunov Critic is that we don't want to use the Exponential Moving Average when we are optimizing lambda and the Actor. This makes total sense when comparing this with the formulas used in Han et al 2019:
I also found an important distinction between the LAC TensorFlow-eager implementation and the Pytorch LAC implementation. The code that creates the Gaussian actor and Lyapunov critic, that is used in the LAC PyTorch implementation is based on the spinning up SAC implementation. In this implementation when creating the networks, it is assumed that the output layer is a fully-connected layer with a relu activation function. In our LAC algorithm, it has to be a fully connected layer with a square activation function. We, therefore, have to make sure. We, therefore, have to make sure that the output layer size is not passed to the mlp function as this is already handled in the forward function of the network. If this is not done the network has an extra fully connected layer. I already fixed this early on in the machine_learning_control/LAC implementation, but when I translated the code again here, I thought it is wise to write it down for future reference.
Completed in 7382534.
To make the code as close as possible to PyTorch we need to translate it to tf2 eager code.