Closed alex-deineha closed 4 years ago
Good catch. The issues is in sampling (or storing) observations to the replay buffer. After sampling you should have obs
of shape (32, 4), but for some reason it ends up with a vector of (32) elements. Recent PR added "extend" version of the add, but that did not modify the original add
, and on a quick glimpse I do not see where the error could be.
Well, I tried to investigate it a bit, obses_t, obses_tp1
have correct shape (32, 4)
The problem is in dones
, the train
method tries to put dones
as input to neural network.
A checker that by changing the size of dones
, and exception text changed too.
Seems like the call signature has changed at some point. Looking at DQN code using that tf function, obses_tp1
are provided twice (for some reason?), while this code does not do that.
I think we should delete custom_cartpole
it is an old code that does not follow the interface and the best practices from SB.
(for some reason?
I think it is a typo, I don't see any reason.
I will close this issue in favor of #812 as it has the fix in it.
I tried to execute this code from custom_cartpole.py using stable-baselines and tf 1.14
Describe the bug I investigated the problem a bit. Here most likely the problem is that we are trying to feed dones to network as input.
System Info Describe the characteristic of your environment: