takuseno / d3rlpy

An offline deep reinforcement learning library
https://takuseno.github.io/d3rlpy
MIT License
1.3k stars 232 forks source link

[BUG] Issue with rendering Atari environment #302

Closed indweller closed 1 year ago

indweller commented 1 year ago

I tried to render the environment for Atari Pong. But I keep running into the following error. Code:

from d3rlpy.datasets import get_atari
from d3rlpy.algos import DQNConfig
from d3rlpy.metrics import TDErrorEvaluator, EnvironmentEvaluator

dataset, env = get_atari(env_name='pong-expert-v4')
dqn = DQNConfig().create(device='cuda:0')
dqn.build_with_dataset(dataset)

td_error_evaluator = TDErrorEvaluator(episodes=dataset.episodes)
env_evaluator = EnvironmentEvaluator(env, render=True)
rewards = env_evaluator(dqn, dataset=None)

The output is as follows:

A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
[Powered by Stella]
/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py:31: UserWarning: WARN: A Box observation space has an unconventional shape (neither an image, nor a 1D vector). We recommend flattening the observation to have only a 1D vector or use a custom policy to properly process the data. Actual observation shape: (84, 84)
  logger.warn(
loading /home/prashanth/.d4rl/datasets/Pong/5/50/observation.gz...
loading /home/prashanth/.d4rl/datasets/Pong/5/50/action.gz...
loading /home/prashanth/.d4rl/datasets/Pong/5/50/reward.gz...
loading /home/prashanth/.d4rl/datasets/Pong/5/50/terminal.gz...
/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py:174: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
  logger.warn(
/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py:187: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
  logger.warn(
/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py:233: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
  if not isinstance(terminated, (bool, np.bool8)):
/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py:289: UserWarning: WARN: No render fps was declared in the environment (env.metadata['render_fps'] is None or not defined), rendering may occur at inconsistent fps.
  logger.warn(
Traceback (most recent call last):
  File "trial.py", line 11, in <module>
    rewards = env_evaluator(dqn, dataset=None)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/d3rlpy/metrics/evaluators.py", line 540, in __call__
    return evaluate_qlearning_with_environment(
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/d3rlpy/metrics/utility.py", line 63, in evaluate_qlearning_with_environment
    env.render()
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/core.py", line 329, in render
    return self.env.render(*args, **kwargs)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/core.py", line 329, in render
    return self.env.render(*args, **kwargs)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/wrappers/order_enforcing.py", line 51, in render
    return self.env.render(*args, **kwargs)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/wrappers/env_checker.py", line 53, in render
    return env_render_passive_checker(self.env, *args, **kwargs)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py", line 316, in env_render_passive_checker
    result = env.render(*args, **kwargs)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/d4rl_atari/envs.py", line 48, in render
    self._env.render(mode)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/core.py", line 329, in render
    return self.env.render(*args, **kwargs)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/wrappers/order_enforcing.py", line 51, in render
    return self.env.render(*args, **kwargs)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/wrappers/env_checker.py", line 53, in render
    return env_render_passive_checker(self.env, *args, **kwargs)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py", line 316, in env_render_passive_checker
    result = env.render(*args, **kwargs)
TypeError: render() takes 1 positional argument but 2 were given

Additionally, I also tried to train without rendering, load the trained model separately and then render it during evaluation with gym.make('pong-expert-v4', render_mode='human') and env.render(). But the same error appears. I didn't face any issues while rendering CartPole

takuseno commented 1 year ago

@indweller Thanks for reporting this. This was because get_atari relies on another repository d4rl-atari, which I just fixed this issue there. Also, I've updated d3rlpy to support the latest render interface at these commits: https://github.com/takuseno/d3rlpy/commit/a7207cd24455ae399c2f4217bb328940ab86b0e1 https://github.com/takuseno/d3rlpy/commit/2121edd853ea32a9fe6d262de639c69c13b50c24 . I'll release a patch that includes these fixes later today. When you try this, please reinstall d4rl pip install -U git+https://github.com/takuseno/d4rl-atari.

takuseno commented 1 year ago

The latest patch has been released. https://github.com/takuseno/d3rlpy/releases/tag/v2.0.4

takuseno commented 1 year ago

Just as a reference, you can enable rendering like this:

from d3rlpy.datasets import get_atari
from d3rlpy.algos import DQNConfig
from d3rlpy.metrics import TDErrorEvaluator, EnvironmentEvaluator

dataset, env = get_atari(env_name='pong-expert-v4', render_mode="human")
dqn = DQNConfig().create(device='cuda:0')
dqn.build_with_dataset(dataset)

td_error_evaluator = TDErrorEvaluator(episodes=dataset.episodes)
env_evaluator = EnvironmentEvaluator(env)
rewards = env_evaluator(dqn, dataset=None)
indweller commented 1 year ago

Thanks for your quick response @takuseno ! The code snippet that you shared seems to be working fine. But when I use it like shown below, I get the following error.

import d3rlpy, d4rl_atari
import gym
import numpy as np

dqn = d3rlpy.load_learnable('./pongdqn.d3')

env = gym.make('pong-expert-v4', render_mode='human')
observations = env.reset()

observations = observations[0]

terminated = False
truncated = False
total = 0
positive_reward = 0

while not terminated and not truncated:
    action = dqn.predict(observations.reshape((1,1,84,84)))[0]
    observations, reward, terminated, truncated, info = env.step(action)
    env.render()
    print(f"Reward: {reward}\n")
    total += reward
    if reward > 0:
        positive_reward += reward

print(f"Total Reward: {total}, Positive Reward: {positive_reward}")

env.close()

Error:

2023-07-23 18:50:34 [warning  ] There might be incompatibility because of version mismatch. current_version=2.0.4 saved_version=2.0.3
/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/envs/registration.py:623: UserWarning: WARN: The environment is being initialised with mode (human) that is not in the possible render_modes ([]).
  logger.warn(
A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
[Powered by Stella]
/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py:31: UserWarning: WARN: A Box observation space has an unconventional shape (neither an image, nor a 1D vector). We recommend flattening the observation to have only a 1D vector or use a custom policy to properly process the data. Actual observation shape: (84, 84)
  logger.warn(
/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py:174: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
  logger.warn(
/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py:187: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
  logger.warn(
/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py:233: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
  if not isinstance(terminated, (bool, np.bool8)):
Traceback (most recent call last):
  File "testing.py", line 25, in <module>
    env.render()
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/core.py", line 329, in render
    return self.env.render(*args, **kwargs)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/wrappers/order_enforcing.py", line 51, in render
    return self.env.render(*args, **kwargs)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/wrappers/env_checker.py", line 53, in render
    return env_render_passive_checker(self.env, *args, **kwargs)
  File "/home/prashanth/projects/sandbox/lib/python3.8/site-packages/gym/utils/passive_env_checker.py", line 307, in env_render_passive_checker
    assert (
AssertionError: With no render_modes, expects the Env.render_mode to be None, actual value: human