rickstaa commented 4 years ago

Problem statement

When we compare the Pytorch translation with the Tensorflow translation we see that the PyTorch version is not training:

Pytorch results

Tensorflow results

rickstaa commented 4 years ago

Differences that still exist between the (translated) PyTorch LAC and TensorFlow LAC

Squashed gaussian network

The SquashedGaussian actor is to complex for a one-to-one translation. I, therefore, had to use the nn.Module class instead of the nn.Sequential class. When doing this I, however, found a small difference between the LAC code and the SAC (spinning up) class.

LAC returns a squashed deterministic action during interference.

In both Minghoas code and the code of Haarnoja et. al 2019 ([see L125])(https://github.com/haarnoja/sac/blob/8258e33633c7e37833cc39315891e77adfbe14b2/sac/policies/gaussian_policy.py#L125)) the (deterministic) clipped_mu which comes from the mu.network() is squashed with the Tanh function. In the spinning up version, this is not done.

SAC version (L49)

mu = self.mu_layer(net_out)
clipped_mu = mu

LAC version (L244)

mu = tf.layers.dense(net_1, self.a_dim, activation= None, name='a', trainable=trainable)
clipped_mu = squash_bijector.forward(mu)

Further, my class (like the spinning up class) already returns the log probability for the current action where Minghoa returns the current distribution and then calculates the log probability of the current activity based on this distribution. Based on my understanding these two should be the same right?

Question

Is the derived correction for the Tanh squashing also valid for the Lyapunov actor and does it mather since this actor is not trained?

https://github.com/rickstaa/filter_LAC_tf_rewrite/blob/eb2edc310ea96bf4d68882f1037a79015b3fdbff/LAC/pytorch_a.py#L112-L125

Lyapunov network

Also the output layer of the lyapunov critic was to complex. I therefore had to use the nn.Module. Maybe something went wrong during my translation:

Translation

https://github.com/rickstaa/filter_LAC_tf_rewrite/blob/eb2edc310ea96bf4d68882f1037a79015b3fdbff/LAC/pytorch_l.py#L59-L63

Original

https://github.com/rickstaa/filter_LAC_tf_rewrite/blob/eb2edc310ea96bf4d68882f1037a79015b3fdbff/LAC/LAC_V1.py#L281-L298

rickstaa commented 4 years ago

Further Debug steps

Possible causes

[x] Check for grammar diff between PyTorch and TensorFlow (Like optimisation).
[x] #2 - Check if the networks are the same between Pytorch and Tensorflow.
- [ ] Compare graphs
- [ ] #4 Compare network outputs when starting from the same random seed and initial state.
- [ ] #4 :bug: Check if all the networks that need to be trained and the network that are frozen stay frozen.
- [ ] #4 Check if the gradients are similar.
[x] #3 :bug: Check if all the desired parameters are optimized and that the gradients are set to zero before the next optimization step.
[ ] Check if everything is passed to the graph.
[ ] Check if entropy is working.
[ ] Double-check replay buffer.
[ ] Check if minibatch is correct.

rickstaa commented 4 years ago

Closed as the same behaviour is present when translating TF1 code to TF2 eager. This issue is not present when eager execution is disabled in TF2. The debugging continuous on this issue #9 .

rickstaa / LAC-TF2-TORCH-translation

Translated code has similar problems as the spinningup based code #1

Problem statement

Pytorch results

Tensorflow results

Differences that still exist between the (translated) PyTorch LAC and TensorFlow LAC

Squashed gaussian network

LAC returns a squashed deterministic action during interference.

Question

Lyapunov network

Translation

Original

Further Debug steps

Possible causes