ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.14k stars 5.8k forks source link

[rllib] Video Recording not working for some environment (after PR 14796) #16200

Closed rfali closed 3 years ago

rfali commented 3 years ago

Related PR#14796

What is the problem?

I was trying to use the render and recorder for the PettingZoo environments, but only the render works (and the pygame window crashes at the end of episode) and the recorder doesn't record anything at all (there is no videos folder as well). Has this patch been verified to work with custom multi-agent envs?

First, I verified that the rllib/examples/env_rendering_and_recording.py works, renders and saves the videos. There is a helpful prompt that says the recorded video is being saved with location path.

I then tried 2 pettingzoo environments (waterworld and space_invaders), both of them did render but the pygame window crashes. If render is set to False, then the training completes but there are no videos folder, or any prompt that they are being saved. Here is the code I tried from one of the rllib examples rllib/examples/multi_agent_parameter_sharing.py

Ray version and other system information (Python version, TensorFlow version, OS): I installed the nightly wheels from here and also upgraded gym to latest version. ray: 2.0.0.dev0 gym: 0.18.3 pettingzoo: 1.8.2 python: 3.8.0

Reproduction (REQUIRED)

Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):

from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.sisl import waterworld_v3

if __name__ == "__main__":

    def env_creator(args):
        return PettingZooEnv(waterworld_v3.env())

    env = env_creator({})
    register_env("waterworld", env_creator)

    obs_space = env.observation_space
    act_space = env.action_space

    policies = {"shared_policy": (None, obs_space, act_space, {})}

    # for all methods
    policy_ids = list(policies.keys())

    tune.run(
        "APEX_DDPG",
        stop={"episodes_total": 10},
        checkpoint_freq=10,
        local_dir="my_results",
        config={

            # Enviroment specific
            "env": "waterworld",

            # General
            "num_gpus": 1,
            "num_workers": 2,
            "num_envs_per_worker": 8,
            "learning_starts": 1000,
            "buffer_size": int(1e5),
            "compress_observations": True,
            "rollout_fragment_length": 20,
            "train_batch_size": 512,
            "gamma": .99,
            "n_step": 3,
            "lr": .0001,
            "prioritized_replay_alpha": 0.5,
            "final_prioritized_replay_beta": 1.0,
            "target_network_update_freq": 50000,
            "timesteps_per_iteration": 25000,

            # Method specific
            "multiagent": {
                "policies": policies,
                "policy_mapping_fn": (lambda agent_id: "shared_policy"),
            },
            "evaluation_interval": 1,
            "evaluation_num_episodes": 2,
            "evaluation_num_workers": 1,
            "evaluation_config": {
                "record_env": "videos",
                "render_env": False,
            },
        },
    )

This one uses the space_invaders game and I also moved the render and recorder out of the evaluation config, but there was no change to the outcome.

from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.atari import space_invaders_v1

if __name__ == "__main__":

    def env_creator(args):
        return PettingZooEnv(space_invaders_v1.env())

    env = env_creator({})
    register_env("space_invaders", env_creator)

    obs_space = env.observation_space
    act_space = env.action_space

    policies = {"shared_policy": (None, obs_space, act_space, {})}

    # for all methods
    policy_ids = list(policies.keys())

    tune.run(
        "PPO",
        stop={"episodes_total": 10},
        checkpoint_freq=10,
        local_dir="my_results",
        config={
            # Enviroment specific
            "env": "space_invaders",

            # General
            "num_gpus": 1,
            "num_workers": 1,
            "num_envs_per_worker": 2,
            "record_env": "videos",
            "render_env": False,

        },
    )

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

sven1977 commented 3 years ago

Hey @rfali, could you take a look at this PR here and let me know, whether this would fix your issue? It was a little tricky fixing this. The reason was that RLlib uses the gym Monitor wrapper, which only works on gym.Env objects. MultiAgentEnv is not a gym.Env (even though it almost looks like one), so I had to create a new child wrapper that is able to handle MultiAgentEnv (which e.g. returns dict rewards, not floats).

https://github.com/ray-project/ray/pull/16428

rohin-dasari commented 2 years ago

The script provided by @rfali still doesn't work for me. I'm using ray version 1.11.0. When I run the script, the directory passed to record_env is created, but it is empty and no renderings are produced. Is there anything that needs to be added to the script to save the renderings?

malintha commented 2 years ago

I'm experiencing the same on ray 2.0. I have slightly changed the code above, as currently pettingzoo is throwing an error when trying to use the waterworld environment. So, I changed it to simplespread and the algorithm to DQN. Here is my reproduction code.


from ray.rllib.algorithms.ppo import PPO
from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.mpe import simple_spread_v2
import ray
import random
import numpy as np
import torch

if __name__ == "__main__":

    seed = 0
    random.seed(seed)
    np.random.seed(seed)

    def env_creator(args):
        return PettingZooEnv(simple_spread_v2.env(continuous_actions=False))

    env = env_creator({})
    register_env("simple_spread", env_creator)

    obs_space = env.observation_space
    act_space = env.action_space

    policies = {"shared_policy": (None, obs_space, act_space, {})}

    # for all methods
    policy_ids = list(policies.keys())
    ray.init(log_to_driver=False, num_gpus=1) 
    tune.run(
        "DQN",
        stop={"episodes_total": 1500},
        checkpoint_freq=10,
        local_dir="my_results",
        config={

            # Enviroment specific
            "env": "simple_spread",
            "framework":"torch",
            # General
            "num_gpus": 1,
            "num_workers": 6,
            "num_envs_per_worker": 8,
            "learning_starts": 1000,        
            "compress_observations": True,
            "rollout_fragment_length": 20,
            "train_batch_size": 32,
            "gamma": .99,
            "lr": .00005,

            # Method specific
            "multiagent": {
                "policies": policies,
                "policy_mapping_fn": (lambda agent_id: "shared_policy"),
            },
            "evaluation_interval": 1,
            "evaluation_num_episodes": 2,
            "evaluation_num_workers": 1,
            "evaluation_config": {
                "record_env": "videos",
                "render_env": False,
            },
        },
    )