Open calerc opened 3 years ago
This should be fixed in 2.10.1 so try installing stable-baselines=2.10.1
(see #787 and changelog). See if that works.
Installing stable-baselines=2.10.1
did not work. Looking at TD3.learn()
version 2.10.1:
self.env.step()
on line 330callback.on_step()
is called on line 337episode_rewards
is updated on line 394.Since callback.on_step()
has access to the correct reward for the step, but not the correct reward for the episode, the problem could be solved by having the callback keep track of the episode rewards. But, it seems that calling callback.on_step()
after episode_rewards[-1] += reward_
(or equivalent for other models) would be a more robust solution.
Hello,
If you want a robust way to retrieve episode reward variable, you should use a Monitor
wrapper together with a callback.
This is what we do in Stable-Baselines3.
In fact, depending on what you really want to do, you could possibly only use a gym.Wrapper
.
The following applies to DDPG and TD3, and possibly other models. The following libraries were installed in a virtual environment:
numpy==1.16.4 stable-baselines==2.10.0 gym==0.14.0 tensorflow==1.14.0
Episode rewards do not seem to be updated in
model.learn()
beforecallback.on_step()
. Depending on whichcallback.locals
variable is used, this means that:Also the
callback.locals
episode reward variables are different for DDPG and TD3, meaning that a callback that is useful for both models has to account for differences in episode reward variable names and types.The following code reproduces the error for DDPG and TD3: