[Bug Report] Vector env return value

openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.

Other

34.2k stars 8.58k forks source link

Describe the bug Hi, I used the vector env api in gym to train Atari-PongNoFrameSkip-v4. After the agent interacts with the environment for a period of time, I discovered a strange phenomenon. The cumulative reward was 21.0, but the corresponding done status was still False.

An intuitive example is described below:

rewards: [21.0, 19.0, 17.0, 21.0, 18.0, 21.0]
done: [True, False, False, True, False, False]

In this case, the env reached the max reward cannot be set done. And the cumulative reward would increased. Is this situation normal?

System Info gym==0.18

openai / gym

[Bug Report] Vector env return value #3253