Open zlw21gxy opened 3 years ago
Yes, I also found this bug. But this will only affect the speed of backprop, not changing the policy gradient.
@twni2016 Hi twni2016, you mentioned that this will only affect the speed of backdrop, do you have any idea to resolve the slow performance issue? I also found that the SAC implementation is extremely slow...
@Roadsong I don't think the spinnup is very slow (maybe you have other issues). You can just follow @zlw21gxy's way to fix this issue.
Another quick fix is using q_params = list(itertools.chain(ac.q1.parameters(), ac.q2.parameters()))
This seems to be the same issue solved by two pending PRs:
For TD3: "Fix problem with empty iterator" #330
For SAC: "Fix Q-networks freezing in PyTorch SAC" #251
q_params = list(itertools.chain(ac.q1.parameters(), ac.q2.parameters()))
And it seems to be addressed too with a lambda function in this other pending PR: "Fixes sac critic grad freeze bug" #320:
q_params = lambda: itertools.chain(
*[gen() for gen in [ac.q1.parameters, ac.q2.parameters]]
)
(I must say that I have NOT tried them though, as I'm now working with PPO.)
SAC algorithm in PyTorch implementation has a serious bug
q_params = itertools.chain(ac.q1.parameters(), ac.q2.parameters())
itertools.chain
will become empty after the first iteration, so every time you call q_params, you just use an empty iterator.actually after initialized the q_optimizer, q_params becomes empty, but we call it multiple times to set gradients
a quick fix is using