openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.2k stars 8.58k forks source link

[Bug Report] Vector env return value #3253

Open Root970103 opened 6 months ago

Root970103 commented 6 months ago

Describe the bug Hi, I used the vector env api in gym to train Atari-PongNoFrameSkip-v4. After the agent interacts with the environment for a period of time, I discovered a strange phenomenon. The cumulative reward was 21.0, but the corresponding done status was still False.

An intuitive example is described below:

rewards: [21.0, 19.0, 17.0, 21.0, 18.0, 21.0]
done: [True, False, False, True, False, False]

In this case, the env reached the max reward cannot be set done. And the cumulative reward would increased. Is this situation normal?

System Info gym==0.18

pseudo-rnd-thoughts commented 6 months ago

Without more of your code it is difficult to tell what is happening also this is for v0.18 which is several years old so we wouldn't be updated any code unless this is still an issue now