sail-sg / envpool

C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
https://envpool.readthedocs.io
Apache License 2.0
1.09k stars 100 forks source link

[BUG] `Atlantis-v5` does not reset life counter. #171

Closed vwxyzjn closed 2 years ago

vwxyzjn commented 2 years ago

Describe the bug

When all the lives are exhausted in Atlantis-v5, making an additional step does not reset the life counter, whereas in Breakout-v5 it does. I am not sure if this is something particular with the Atlantis environment though.

To Reproduce

Steps to reproduce the behavior.

Please try to provide a minimal example to reproduce the bug. Error messages and stack traces are also helpful.

Please use the markdown code blocks for both code and stack traces.

import envpool
import numpy as np

num_envs = 1
print("making Atlantis-v5")
envs = envpool.make(
    "Atlantis-v5",
    env_type="gym",
    num_envs=num_envs,
    episodic_life=True,
    reward_clip=True,
)
envs.reset()
for i in range(10000):
    _, _, next_done, info = envs.step(np.random.randint(0, envs.action_space.n, num_envs))
    if info["lives"].sum() == 0:
        print(f"step={i}, lives is", info["lives"].sum())
        break

_, _, next_done, info = envs.step(np.random.randint(0, envs.action_space.n, num_envs))
print(f"step={i+1}, lives is", info["lives"].sum())
print(f"notice how step={i+1} does not reset the life counter in Atlantis")

print("making Atlantis-v5")
envs = envpool.make(
    "Breakout-v5",
    env_type="gym",
    num_envs=num_envs,
    episodic_life=True,
    reward_clip=True,
)
envs.reset()
for i in range(10000):
    _, _, next_done, info = envs.step(np.random.randint(0, envs.action_space.n, num_envs))
    if info["lives"].sum() == 0:
        print(f"step={i}, lives is", info["lives"].sum())
        break

_, _, next_done, info = envs.step(np.random.randint(0, envs.action_space.n, num_envs))
print(f"step={i+1}, lives is", info["lives"].sum())
print(f"notice how step={i+1} does reset the life counter in Breakout")
making Atlantis-v5
step=1148, lives is 0
step=1149, lives is 0
notice how step=1149 does not reset the life counter in Atlantis
making Atlantis-v5
step=123, lives is 0
step=124, lives is 5
notice how step=124 does reset the life counter in Breakout

Expected behavior

If all the lives are exhausted, making an additional step should reset the life counter.

System info

Describe the characteristic of your environment:

import envpool, numpy, sys
print(envpool.__version__, numpy.__version__, sys.version, sys.platform)
>>> print(envpool.__version__, numpy.__version__, sys.version, sys.platform)
0.6.2.post2 1.23.1 3.8.11 (default, Oct  9 2021, 12:06:05) 
[GCC 10.3.0] linux

Reason and Possible fixes

Maybe this is

Checklist

vwxyzjn commented 2 years ago

Actually this happens to other envs:

import envpool
import numpy as np
from atari_data import atari_human_normalized_scores
num_envs = 1
for env_id in envpool.list_all_envs():
    # filter out non atari envs
    if "v5" not in env_id:
        continue
    envs = envpool.make(
        env_id,
        env_type="gym",
        num_envs=num_envs,
        episodic_life=True,
        reward_clip=True,
    )
    envs.reset()
    has_lives = False
    info = envs.step(np.zeros(num_envs, dtype=int))[-1]
    if info["lives"].sum() > 0:
        has_lives = True
    else:
        continue
    envs.reset()
    for i in range(10000):
        _, _, next_done, info = envs.step(np.random.randint(0, envs.action_space.n, num_envs))
        if info["lives"].sum() == 0:
            # print(f"step={i}, lives is", info["lives"].sum())
            break
    # take an additional step after all lives are exhausted
    _, _, next_done, info = envs.step(np.random.randint(0, envs.action_space.n, num_envs))
    if info["lives"].sum() == 0:
        print(f"{env_id} does not reset the life counter")
Atlantis-v5 does not reset the life counter
Atlantis2-v5 does not reset the life counter
BankHeist-v5 does not reset the life counter
DonkeyKong-v5 does not reset the life counter
Frogger-v5 does not reset the life counter
KeystoneKapers-v5 does not reset the life counter
LostLuggage-v5 does not reset the life counter
MrDo-v5 does not reset the life counter
SirLancelot-v5 does not reset the life counter
Turmoil-v5 does not reset the life counter
WizardOfWor-v5 does not reset the life counter
Trinkle23897 commented 2 years ago

You cannot instantly check if lives in t == 0 and lives in t+1 > 0: https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/common/atari_wrappers.py#L76-L80

Trinkle23897 commented 2 years ago

There's a bug in this line: https://github.com/sail-sg/envpool/blob/ea86c2b77d12aaa58725bfeb1d701e3207f11822/envpool/atari/atari_env.h#L202 should be 0 < env_->lives() && env_->lives() < lives_ I'll make a PR to fix this.

vwxyzjn commented 2 years ago

Thank you! Also a related feature request - would it be possible to have a variable in the info section indicating if the environment is terminated?

Trinkle23897 commented 2 years ago
import envpool
import numpy as np
num_envs = 1
for env_id in envpool.list_all_envs():
    # filter out non atari envs
    if "v5" not in env_id:
        continue
    envs = envpool.make(
        env_id,
        env_type="gym",
        num_envs=num_envs,
        episodic_life=True,
        reward_clip=True,
    )
    envs.reset()
    has_lives = False
    info = envs.step(np.zeros(num_envs, dtype=int))[-1]
    if info["lives"].sum() > 0:
        has_lives = True
    else:
        continue
    envs.reset()
    for i in range(20000):
        _, _, done, info = envs.step(np.random.randint(0, envs.action_space.n, num_envs))
        if info["lives"].sum() == 0:
            break
    if info["lives"].sum() > 0:
        print(f"{env_id} steps too long")
        continue
    # for normal atari (e.g., breakout)
    # take an additional step after all lives are exhausted
    _, _, next_done, info = envs.step(np.random.randint(0, envs.action_space.n, num_envs))
    if done and info["lives"].sum() > 0:
        print(f"{env_id} pass check")
        continue
    assert not done
    for i in range(2000):
        _, _, done, info = envs.step(np.random.randint(0, envs.action_space.n, num_envs))
        if done:
            break
    _, _, next_done, info = envs.step(np.random.randint(0, envs.action_space.n, num_envs))
    if info["lives"].sum() > 0:
        print(f"{env_id} pass check in {i} steps")
    else:
        print(f"{env_id} does not reset the life counter")

Result after bug fix:

Adventure-v5 steps too long
AirRaid-v5 pass check
Alien-v5 pass check
Amidar-v5 pass check
Assault-v5 pass check
Asterix-v5 pass check
Asteroids-v5 pass check
Atlantis-v5 pass check in 146 steps
Atlantis2-v5 pass check in 30 steps
Backgammon-v5 steps too long
BankHeist-v5 pass check in 64 steps
BasicMath-v5 pass check
BattleZone-v5 pass check
BeamRider-v5 pass check
Berzerk-v5 pass check
Blackjack-v5 pass check
Breakout-v5 pass check
Casino-v5 pass check
Centipede-v5 pass check
ChopperCommand-v5 pass check
CrazyClimber-v5 pass check
Crossbow-v5 pass check
Darkchambers-v5 steps too long
Defender-v5 pass check
DemonAttack-v5 pass check
DonkeyKong-v5 pass check in 215 steps
Earthworld-v5 steps too long
ElevatorAction-v5 steps too long
Entombed-v5 pass check
Et-v5 pass check in 1371 steps
FlagCapture-v5 pass check
Frogger-v5 pass check in 55 steps
Frostbite-v5 pass check
Galaxian-v5 pass check
Gopher-v5 pass check
Gravitar-v5 pass check
Hangman-v5 pass check
HauntedHouse-v5 pass check
Hero-v5 pass check
HumanCannonball-v5 pass check
Jamesbond-v5 pass check
Kaboom-v5 pass check
Kangaroo-v5 pass check
KeystoneKapers-v5 pass check in 1164 steps
KingKong-v5 pass check
Klax-v5 pass check
Krull-v5 pass check
KungFuMaster-v5 pass check
LostLuggage-v5 pass check in 120 steps
MarioBros-v5 pass check
MiniatureGolf-v5 steps too long
MontezumaRevenge-v5 pass check
MrDo-v5 pass check in 74 steps
MsPacman-v5 pass check
NameThisGame-v5 pass check
Othello-v5 pass check
Pacman-v5 pass check
Phoenix-v5 pass check
Pitfall-v5 pass check
Pitfall2-v5 steps too long
Pooyan-v5 pass check
Qbert-v5 pass check
Riverraid-v5 pass check
RoadRunner-v5 pass check
Robotank-v5 pass check
Seaquest-v5 pass check
SirLancelot-v5 pass check in 38 steps
Solaris-v5 pass check
SpaceInvaders-v5 pass check
SpaceWar-v5 steps too long
StarGunner-v5 pass check
Superman-v5 steps too long
Surround-v5 pass check
TicTacToe3d-v5 pass check
TimePilot-v5 pass check
Trondead-v5 pass check
Turmoil-v5 pass check in 160 steps
Tutankham-v5 pass check
UpNDown-v5 pass check
Venture-v5 pass check
VideoCheckers-v5 pass check
VideoChess-v5 steps too long
VideoCube-v5 steps too long
VideoPinball-v5 pass check
WizardOfWor-v5 pass check in 15 steps
WordZapper-v5 pass check
YarsRevenge-v5 pass check
Zaxxon-v5 pass check
Trinkle23897 commented 2 years ago

Thank you! Also a related feature request - would it be possible to have a variable in the info section indicating if the environment is terminated?

Sure! Do you have any idea about the naming?

vwxyzjn commented 2 years ago

How about info['terminated']?

vwxyzjn commented 2 years ago

What does that even mean though that you can continue stepping into the environment when all the lives are exhausted?

Trinkle23897 commented 2 years ago

What does that even mean though that you can continue stepping into the environment when all the lives are exhausted?

Yes.

import gym

e = gym.make("AtlantisNoFrameskip-v4")
e.reset()
for t in range(10000):
    o, r, d, i = e.step(0)
    print(t, d, i)
    if d:
        break
1673 False {'ale.lives': 0}
1674 False {'ale.lives': 0}
1675 False {'ale.lives': 0}
1676 False {'ale.lives': 0}
1677 False {'ale.lives': 0}
1678 False {'ale.lives': 0}
1679 False {'ale.lives': 0}
1680 False {'ale.lives': 0}
1681 False {'ale.lives': 0}
1682 False {'ale.lives': 0}
1683 False {'ale.lives': 0}
1684 False {'ale.lives': 0}
1685 False {'ale.lives': 0}
1686 False {'ale.lives': 0}
1687 False {'ale.lives': 0}
1688 False {'ale.lives': 0}
1689 False {'ale.lives': 0}
1690 True {'ale.lives': 0}