rickstaa / LAC-TF2-TORCH-translation

Temporary repository to debug what goes wrong during the translation of the LAC algorithm from TF1 to Torch.
0 stars 1 forks source link

Translated code has similar problems as the spinningup based code #1

Closed rickstaa closed 4 years ago

rickstaa commented 4 years ago

Problem statement

When we compare the Pytorch translation with the Tensorflow translation we see that the PyTorch version is not training:

Pytorch results

image

image

Tensorflow results

image

image

rickstaa commented 4 years ago

Differences that still exist between the (translated) PyTorch LAC and TensorFlow LAC

Squashed gaussian network

The SquashedGaussian actor is to complex for a one-to-one translation. I, therefore, had to use the nn.Module class instead of the nn.Sequential class. When doing this I, however, found a small difference between the LAC code and the SAC (spinning up) class.

LAC returns a squashed deterministic action during interference.

In both Minghoas code and the code of Haarnoja et. al 2019 ([see L125])(https://github.com/haarnoja/sac/blob/8258e33633c7e37833cc39315891e77adfbe14b2/sac/policies/gaussian_policy.py#L125)) the (deterministic) clipped_mu which comes from the mu.network() is squashed with the Tanh function. In the spinning up version, this is not done.

SAC version (L49)

mu = self.mu_layer(net_out)
clipped_mu = mu

LAC version (L244)

mu = tf.layers.dense(net_1, self.a_dim, activation= None, name='a', trainable=trainable)
clipped_mu = squash_bijector.forward(mu)

Further, my class (like the spinning up class) already returns the log probability for the current action where Minghoa returns the current distribution and then calculates the log probability of the current activity based on this distribution. Based on my understanding these two should be the same right?

Question

Is the derived correction for the Tanh squashing also valid for the Lyapunov actor and does it mather since this actor is not trained?

https://github.com/rickstaa/filter_LAC_tf_rewrite/blob/eb2edc310ea96bf4d68882f1037a79015b3fdbff/LAC/pytorch_a.py#L112-L125

Lyapunov network

Also the output layer of the lyapunov critic was to complex. I therefore had to use the nn.Module. Maybe something went wrong during my translation:

Translation

https://github.com/rickstaa/filter_LAC_tf_rewrite/blob/eb2edc310ea96bf4d68882f1037a79015b3fdbff/LAC/pytorch_l.py#L59-L63

Original

https://github.com/rickstaa/filter_LAC_tf_rewrite/blob/eb2edc310ea96bf4d68882f1037a79015b3fdbff/LAC/LAC_V1.py#L281-L298

rickstaa commented 4 years ago

Further Debug steps

Possible causes

rickstaa commented 4 years ago

Closed as the same behaviour is present when translating TF1 code to TF2 eager. This issue is not present when eager execution is disabled in TF2. The debugging continuous on this issue #9 .