Closed krishna-bala closed 3 years ago
Hi @krishna-bala this all looks nice and makes sense. I suggest you change action
passed to the env in trainA2C.py
to be a dictionary of actions (where the key is the id of the agent you'd like to take that random action, e.g., action = {0: [spd, heading]}
)
Hi @mfe7 -- I just noticed that mistake as well after comparing code to example.py.
I modified my code to instantiate actions as a dict, which solves the error from self._take_action(actions,dt)
in collision_avoidance_env.py
. Now, an action list of [delta heading angle, speed] should get passed to agent.policy.external_action_to_action()
.
However, the VecEnv wrapper around the CollisionAvoidanceEnv (object env from the method create_env()
) calls the step()
method from vec_env.py
from the baselines library.
This vec_env.py
step()
method calls step_async()
and step_wait()
from dummy_vec_env.py
(baselines library).
In the method step_wait()
, the action dict gets converted to a list before calling env.step()
(the method defined in collision_avoidance_env.py
). See code below.
def step_wait(self):
print('step_wait() method')
for e in range(self.num_envs):
action = self.actions[e]
# if isinstance(self.envs[e].action_space, spaces.Discrete):
# action = int(action)
print('action from step_wait() method: {}'.format(action))
obs, self.buf_rews[e], self.buf_dones[e], self.buf_infos[e] = self.envs[e].step(action)
if self.buf_dones[e]:
obs = self.envs[e].reset()
self._save_obs(e, obs)
return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones),
self.buf_infos.copy())
The one_env (unwrapped) environment doesn't have this issue where the action is converted from a dict to a list before being called. Should I nest a dict inside of the actions dict when using the VecEnv wrapper? e.g. action = {0: {0: [spd, heading]}} ?
Here is the random agent that works properly on the one_env environment.
import os
os.environ['GYM_CONFIG_CLASS'] = 'Train'
import gym
from gym_collision_avoidance.envs import Config
import gym_collision_avoidance.envs.test_cases as tc
from gym_collision_avoidance.experiments.src.env_utils import create_env
from stable_baselines.common.policies import MlpPolicy
#from stable_baselines.common import make_vec_env
from stable_baselines import A2C
from stable_baselines.common.env_checker import check_env
# env: a VecEnv wrapper around the CollisionAvoidanceEnv
# one_env: an actual CollisionAvoidanceEnv class (the unwrapped version of the first env in the VecEnv)
env, one_env = create_env()
# check_env(env, warn=True)
# model = A2C(MlpPolicy, env, verbose=1)
# model.learn(total_timesteps=1000)
# The reset method is called at the beginning of an episode
obs = one_env.reset()
num_episodes = 1000
for i in range(num_episodes):
actions = {}
actions[0] = one_env.action_space.sample()
obs, reward, done, info = one_env.step(actions)
if done:
obs = one_env.reset()
Hmm is it possible that you can send a list of dicts (e.g., actions = [{0: [spd, heading]}]
) to env.step
, because the VecEnv wrapper you posted there would then grab the first element of that list and send it to environment 0.
I think that's essentially what we did here: https://github.com/mit-acl/gym-collision-avoidance/blob/dceff1094affdaabeaf624976ffa3918addbe17d/gym_collision_avoidance/experiments/src/env_utils.py#L45-L52
Yes that works! Thank you.
I'm attempting to train a new policy in the "CollisionAvoidance-v0" env using the A2C model from stable-baselines.
Following documentation instructions, I created the following files:
I've had trouble implementing model.learn() from stable-baselines, so in the meantime I am simply implementing a random agent where the agent samples a possible action:
When I attempt to debug the code, the following error occurs:
action = env.action_space.sample()
produces a (2,) array of actions as expected. However, duringenv.step(action)
, the functionself._take_action(actions, dt)
is called in the collision_avoidance_env.py file, where i run into an error:The
agent.policy.external_action_to_action()
method passesactions[agent_index]
as the action. However, myactions
variable is a (2,) array with a heading and speed for my agent. Therefore, only the delta heading angle is passed as an argument (and would have an index error if more than 2 agents defined in class Train() of config.py).Am I implementing a portion of the code incorrectly, or should I change the
_take_actions(actions,dt)
method incollision_avoidance_env.py
so that it passes the (2,) array of actions toexternal_action_to_action()
method? (see below for change)