LingfengTao commented 2 years ago

High: It blocks me to complete my task.

Hi, I’m new to OpenAI Gym and RLlib. SO my question may be dumb. I'm using Anaconda python 3.9 Gym 0.21.0 Ray 1.12.1 Tensorflow 2.8 Torch 1.11.0

Recently I’m doing a multi-agent project and trying to convert the OpenAI Gym robotics environments (Fetch and Handmanipulate) to the multiagent environment with the make_multi_agent wrapper. I modified the simple example and here is my code:

`import ray from ray.rllib.agents.ddpg import DDPGTrainer from ray.tune.registry import register_env

def env_creator(env_config): ma_hand_cls = ray.rllib.env.multi_agent_env.make_multi_agent("HandManipulateBlock-v0") ma_hand = ma_hand_cls({"num_agents": 2}) return ma_hand register_env("ma_hand", env_creator)

Configure the algorithm.

config = {

Environment (RLlib understands openAI gym registered strings).

"env": "ma_hand",
# Use 2 environment workers (aka "rollout workers") that parallelly
# collect samples from their own environment clone(s).
"num_workers": 2,
# Change this to "framework: torch", if you are using PyTorch.
# Also, use "framework: tf2" for tf2.x eager execution.
"framework": "tf",
"render_env": True,
# Tweak the default model provided automatically by RLlib,
# given the environment's observation- and action spaces.
"model": {
    "fcnet_hiddens": [64, 64],
    "fcnet_activation": "relu",
},
# Set up a separate evaluation worker set for the
# `trainer.evaluate()` call after training (see below).
"evaluation_num_workers": 1,
# Only for evaluation runs, render the env.
"evaluation_config": {
    "render_env": True,
},
#"disable_env_checking": True,

}

Create our RLlib Trainer.

trainer = DDPGTrainer(config=config)

Run it for n training iterations. A training iteration includes

parallel sample collection by the environment workers as well as

loss calculation on the collected batch and a model update.

for _ in range(3): print(trainer.train())

Evaluate the trained Trainer (and render each timestep to the shell's

output).

trainer.evaluate()`

When I try to create a trainer with the converted environment, it gives this error: “ValueError: The observation collected from env.reset was not contained within your env’s observation space. It's possible that there was a typemismatch (for example observations of np.float32 and space of np.float64 observations), or that one of the sub-observations wasout of bounds“

I can bypass this error by setting “disable_env_checking=True“ in the config. But after training, the trainer.evaluate() can evaluate the trained policy, but the render is not working (no rendered window pop out). Here are the output of trainer.evaluate():

Out[20]: {'evaluation': {'episode_reward_max': -100.0, 'episode_reward_min': -100.0, 'episode_reward_mean': -100.0, 'episode_len_mean': 50.0, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [-100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0], 'episode_lengths': [50, 50, 50, 50, 50, 50, 50, 50, 50, 50]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.09725146188945352, 'mean_inference_ms': 0.4013698258085879, 'mean_action_processing_ms': 0.0842750191450595, 'mean_env_wait_ms': 1.6527913525670825, 'mean_env_render_ms': 0.04739675693169325}, 'off_policy_estimator': {}, 'timesteps_this_iter': 0}}

Any idea how to solve this problem? Thanks so much!

gjoliver commented 2 years ago

Acknowledge. MultiAgentEnv use the spaces of the original env as obs and action spaces, but gives data in multi-agent format. Disable the env check is the right workaround for now. Do you see any error messages when you say that the render window doesn't pop out? Looking at the code, it basically just calls render() on the first agent. So not sure why it doesn't work. When you use the env directly, does rendering work?