ikostrikov / implicit_q_learning

MIT License
226 stars 38 forks source link

A small problem #6

Closed fuyw closed 2 years ago

fuyw commented 2 years ago

Hi Ilya,

I have a small question about the orthogonal initialization of the policy function.

In pytorch's documentation, it uses a default gain of 5/3 for the tanh activation function.

If we set tanh_squash_distribution = False, then do we need to set the gain to 5/3 for the output layer in the policy network.

means = nn.Dense(self.action_dim, kernel_init=default_init())(outputs).

https://github.com/ikostrikov/implicit_q_learning/blob/09d700248117881a75cb21f0adb95c6c8a694cb2/policy.py#L54

Anyway, this does not matter in practice.