Closed oroojlooy closed 4 years ago
Here are few resources that have helped me better understand the MultiAgentEnv:
https://ray.readthedocs.io/en/latest/rllib-env.html https://github.com/ray-project/ray/blob/master/rllib/examples/rock_paper_scissors_multiagent.py#L7 https://github.com/ray-project/ray/blob/master/rllib/examples/multiagent_two_trainers.py https://github.com/ray-project/ray/blob/master/rllib/examples/multiagent_cartpole.py https://github.com/ray-project/ray/blob/master/rllib/env/multi_agent_env.py
In particular, in looking at the MultiAgentEnv interface documentation, we should notice a few things between MultiAgentEvnv and gym.Env.
Similarities:
Differences:
So it seems to me that we need to shed off the way of thinking about multi agent environments as extensions of gym environments because they seem to be quite different.
Thanks @rusu24edward for the explanation. I agree with you that the MultiCartPole
example is misleading, and we need to define agents who inherit form gym
, and multi-agent env only needs to inherit from MultiAgentEnv. As I explained I have defined the state, action, reward, and done as dictionaries. Inside each state, the observation is saved by a list
. My main problem is to define the observation_space
and action_space
. I do not want to define them when I call the policy, since the environment is going to be called from other packages too.
Right now, the state of the problem looks like { 0: [0.0, 0.0, 0.0, 0.0, 0.0], 1: [0.0, 0.0, 0.0, 0.0, 0.0], 2: [0.0, 0.0, 0.0, 0.0, 0.0], 3: [0.0, 0.0, 0.0, 0.0, 0.0]}
. When I define a dictionary of gym.spaces
, e.g. {0: Box(.), 1: Box(.), ....}
, I get this error:
Traceback (most recent call last):
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 351, in fetch_result
result = ray.get(trial_future[0])
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 2121, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ESC[36mray_PPO:train()ESC[39m (pid=22333, host=polyp30)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 90, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 372, in __init__
Trainable.__init__(self, config, logger_creator)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/tune/trainable.py", line 96, in __init__
self._setup(copy.deepcopy(self.config))
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 492, in _setup
self._init(self.config, self.env_creator)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 109, in _init
self.config["num_workers"])
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 537, in _make_workers
logdir=self.logdir)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 64, in __init__
RolloutWorker, env_creator, policy, 0, self._local_config)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 220, in _make_worker
_fake_sampler=config.get("_fake_sampler", False))
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 348, in __init__
self._build_policy_map(policy_dict, policy_config)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 741, in _build_policy_map
obs_space, merged_conf.get("model"))
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/models/catalog.py", line 367, in get_preprocessor_for_space
cls = get_preprocessor(observation_space)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/models/preprocessors.py", line 254, in get_preprocessor
legacy_patch_shapes(space)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/models/preprocessors.py", line 290, in legacy_patch_shapes
return space.shape
AttributeError: 'dict' object has no attribute 'shape'
Similarly, when I create a single one, e.g. Box(shape=(number_of_agents, stateDim, ) )
, I get ValueError: ('Observation outside expected value range', Box(4,5), array([0., 0., 0., 0., 0.]))
:
Traceback (most recent call last):
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 351, in fetch_result
result = ray.get(trial_future[0])
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 2121, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ESC[36mray_PPO:train()ESC[39m (pid=9308, host=polyp30)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 418, in train
raise e
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 407, in train
result = Trainable.train(self)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/tune/trainable.py", line 176, in train
result = self._train()
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 129, in _train
fetches = self.optimizer.step()
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 140, in step
self.num_envs_per_worker, self.train_batch_size)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/optimizers/rollout.py", line 29, in collect_samples
next_sample = ray_get_and_free(fut_sample)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/utils/memory.py", line 33, in ray_get_and_free
result = ray.get(object_ids)
ray.exceptions.RayTaskError(ValueError): ESC[36mray_RolloutWorker:sample()ESC[39m (pid=9292, host=polyp30)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 469, in sample
batches = [self.input_reader.next()]
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 56, in next
batches = [self.get_data()]
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 99, in get_data
item = next(self.rollout_provider)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 319, in _env_runner
soft_horizon, no_done_at_end)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 407, in _process_observations
policy_id).transform(raw_obs)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/models/preprocessors.py", line 166, in transform
self.check_shape(observation)
File "/scratch/afo214/anaconda3/lib/python3.7/site-packages/ray/rllib/models/preprocessors.py", line 65, in check_shape
self._obs_space, observation)
ValueError: ('Observation outside expected value range', Box(4, 5), array([0., 0., 0., 0., 0.]))
Any idea how to fix this issue?
Regarding the first error, I believe the observation_space and action_space for each policy must be a gym.spaces object. It looks like you are attempting to pass a dictionary of id's mapping to gym.spaces objects as the observation_space for a single policy. This won't work. Each policy must have a gym.space object. You can create a single policy like this and then map a bunch of agents to that policy. For example, the traffic light env shows that all traffic light agents map to the same policy and all cars map to a random selection of car-policies. But the actual policies themselves are just defined with a gym.spaces object for the observation_space and the action_space.
Regarding the second error, I can think of two things. First, if you really are doing Box(shape=(4,5))
, then this should produce an error because gym Box requires the low and high arguments. Secondly, I tend to flatten my observation/action spaces into single-dimensional entities. This is something I picked up from using stable-baselines, which required it. I'm no sure if rllib requires this, but it is worth looking into.
I'm not sure what you mean by agents needing to inherit from gym. The gym interface is for environments, and typically we think of agents as algorithms that learn a policy by interacting with the environments. The algorithms don't need to inherit from gym. They just expect the interface and underlying datastructures (gym.spaces).
Hope this helps!
@rusu24edward As I mentioned I prefer to include the observation_space and action_space as part of the environment properties since I want to use the env with other packages than ray
too. So, having passed the observation_space
and action_space
as it is in the traffic light env
example does not work for me. So, you mean that there is no way of having what I want, right?
Beside, in the second approach, I have the rest of the required input parameters to define a Box
, e.g.,spaces.Box(low=0, high=10, shape=(config.NoAgent, config.stateDim,), dtype=np.float32)
. So, the probably the problem is that it looks for a single input for all agent, though I am passing a dictionary of states for all agents.
About the inheritance, I did not mean the RL agent
which learns the policy, I meant the agents
in the environment, which hold the state, take the action and play it, and return the new state, reward, and done.
Do you expect that all the agents interacting with your environment will have the same observation_space and action_space?
Do you expect that all the agents interacting with your environment will have the same observation_space and action_space?
For this environment, all the observation_space are the same, but the action_space might be different. Let assume the action_space are also equal. Is there any solution for this case?
If the observation and action space for all agents interacting with your environment is the same, then you can store that info in a singular location (like the environment) and grab it from there. You may add a static function that returns the space. For example,
class CustomEnv(MultiAgentEnv):
def __init__(self, ....):
...
@staticmethod
def get_observation_space(param):
return Box(param,....)
@staticmethod
def get_action_space(param):
return Box(param,....)
def step(self):
...
Then when using ray, you can just do something like:
import CustomEnv
trainer = pg.PGAgent(env=CustomEnv, config={
"multiagent": {
"policies": {
"default": (None, CustomEnv.get_observation_space(), CustomEnv.get_action_space(), {"gamma": 0.85}),
},
"policy_mapping_fn": lambda agent_id: "default"
},
})
while True:
print(trainer.train())
It doesn't have to look exactly like this, this is just a design idea to get you thinking.
So, you mean that there is no way of having what I want, right?
You have a bit of a design conflict here. Putting a single observation space and a single action_space in the environment indicates, by design, that all agents interacting with the environment can expect to see the same observation and action space. Now, you can still put the observation and action space in the environment even if they are different, but you have to have some kind of mapping from agent "types" to the spaces. Here is an example:
import ray
from ray.rllib.env.multi_agent_env import MultiAgentEnv
from gym.spaces import Box
class CustomEnv(MultiAgentEnv):
action_mapping = {
'type0': Box(-1, 1, shape=((12,))),
'type1': Box(10, 20, shape=((8,2)))
}
obs_mapping = {
'type0': Box(-4, 5, shape=((12,5))),
'type1': Box(10, 20, shape=((8,2)))
}
@staticmethod
def get_obs_space(type):
return CustomEnv.obs_mapping[type]
@staticmethod
def get_action_space(type):
return CustomEnv.action_mapping[type]
def step(self):
pass
# Test it
obs_space = CustomEnv.get_action_space('type1')
print(obs_space)
Then when using ray, you can just do something like:
import CustomEnv
trainer = pg.PGAgent(env=CustomEnv, config={
"multiagent": {
"policies": {
"type0_policy": (None, CustomEnv.get_observation_space('type0'), CustomEnv.get_action_space('type0'), {"gamma": 0.85}),
"type1_policy": (None, CustomEnv.get_observation_space('type1'), CustomEnv.get_action_space('type1'), {"gamma": 0.85}),
},
"policy_mapping_fn": lambda agent_id: 'type0_policy' if agent_id == 'type0_agent1' else 'type1_policy'
},
})
while True:
print(trainer.train())
There are a lot of ways you can do this, it really just depends on how you want to design it.
Thanks for the detailed explanation. This approach works when I have env-agents with different action spaces.
Ray version and other system information (Python version, TensorFlow version, OS): I have Debian 8.7, python 3.7.4, Tensorflow 2.0.1.
What is your question?
I have a custom multi-agent environment which uses
spaces
fromgym
to defineobservaion_space
and discreteaction_space
, and includesreset
andstep
functions. This environment can have any agents greater than one. I am going to use this environment inray
and run different implemented multi-agent algorithms there. Now, to use it withray
, I tried my current environment and did not work with some weird error for not being able to read state. So, I found that I actually do not know how to define to them in my environment to be match withray.
So, I have some questions about this:action_space
? For example one can think of alist
ofgym.spaces.Discrete(.)
like[gym.spaces.Discrete(.), gym.spaces.Discrete(.), ...]
, or it can be a dictionary of those spaces like{0: gym.spaces.Discrete(.), 1: gym.spaces.Discrete(.), ...}
, or can be agym.spaces.Multidiscrete(.)
. Which one is preferred byray
?observation_space
.env.reset()
, it returns the state and it can be a list or a dictionary. This state has to be provided toray
to run aRL
algorithm. Again, how should it be? a list or dictionary? e.g.[s^0_0, s^0_1, s^0_2, ....]
or{0: s^0_0, 1: s^0_1, 2: s^0_2, ....}
or something else?Please let know if I need to add more details for each question.
Thanks in advance, Afshin