Closed rickstaa closed 4 years ago
The SquashedGaussian actor is to complex for a one-to-one translation. I, therefore, had to use the nn.Module class instead of the nn.Sequential class. When doing this I, however, found a small difference between the LAC code and the SAC (spinning up) class.
In both Minghoas code and the code of Haarnoja et. al 2019 ([see L125])(https://github.com/haarnoja/sac/blob/8258e33633c7e37833cc39315891e77adfbe14b2/sac/policies/gaussian_policy.py#L125)) the (deterministic) clipped_mu
which comes from the mu.network()
is squashed with the Tanh function. In the spinning up version, this is not done.
mu = self.mu_layer(net_out)
clipped_mu = mu
mu = tf.layers.dense(net_1, self.a_dim, activation= None, name='a', trainable=trainable)
clipped_mu = squash_bijector.forward(mu)
Further, my class (like the spinning up class) already returns the log probability for the current action where Minghoa returns the current distribution and then calculates the log probability of the current activity based on this distribution. Based on my understanding these two should be the same right?
Is the derived correction for the Tanh squashing also valid for the Lyapunov actor and does it mather since this actor is not trained?
Also the output layer of the lyapunov critic was to complex. I therefore had to use the nn.Module. Maybe something went wrong during my translation:
Closed as the same behaviour is present when translating TF1 code to TF2 eager. This issue is not present when eager execution is disabled in TF2. The debugging continuous on this issue #9 .
Problem statement
When we compare the Pytorch translation with the Tensorflow translation we see that the PyTorch version is not training:
Pytorch results
Tensorflow results