A small problem - Githubissues

Hi Ilya,

I have a small question about the orthogonal initialization of the policy function.

In pytorch's documentation, it uses a default gain of 5/3 for the tanh activation function.

If we set tanh_squash_distribution = False, then do we need to set the gain to 5/3 for the output layer in the policy network.

means = nn.Dense(self.action_dim, kernel_init=default_init())(outputs).

Anyway, this does not matter in practice.

ikostrikov / implicit_q_learning