Closed fuyw closed 2 years ago
Hi Ilya,
I have a small question about the orthogonal initialization of the policy function.
In pytorch's documentation, it uses a default gain of 5/3 for the tanh activation function.
tanh
If we set tanh_squash_distribution = False, then do we need to set the gain to 5/3 for the output layer in the policy network.
tanh_squash_distribution = False
5/3
means = nn.Dense(self.action_dim, kernel_init=default_init())(outputs).
means = nn.Dense(self.action_dim, kernel_init=default_init())(outputs)
https://github.com/ikostrikov/implicit_q_learning/blob/09d700248117881a75cb21f0adb95c6c8a694cb2/policy.py#L54
Anyway, this does not matter in practice.
Hi Ilya,
I have a small question about the orthogonal initialization of the policy function.
In pytorch's documentation, it uses a default gain of 5/3 for the
tanh
activation function.If we set
tanh_squash_distribution = False
, then do we need to set the gain to5/3
for the output layer in the policy network.means = nn.Dense(self.action_dim, kernel_init=default_init())(outputs)
.https://github.com/ikostrikov/implicit_q_learning/blob/09d700248117881a75cb21f0adb95c6c8a694cb2/policy.py#L54
Anyway, this does not matter in practice.