Closed wyt2019suzhou closed 4 years ago
`from make_env import make_env import numpy as np env = make_env('simple_reference') for i_episode in range(2):
observation = env.reset()
for t in range(10):
agent_actions = []
for i, agent in enumerate(env.world.agents):
agent_action_space = env.action_space[i]
action = agent_action_space.sample()
agent_actions.append(action)
observation, reward, done, info = env.step(agent_actions)
print('u', agent.action.u)
print(agent_actions)
print(observation)
print(reward)
print(done)
print(info)`
I tried this code, but it reported an error
I was also struggling with that, but I finally got it. So it has to be like
act_0 = [0., 0.5, 0.6, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.] act_1 = [0., 0.5, 0.6, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.] env.step([act_0, act_1])
first 5 numbers mean 5 actions and the environment select maximum value action, following 10 numbers vector is a message vector which is added on next observation. You can print it out and see.
What is the input form of the action in the‘simple_reference’?and what do they mean‘left,right....’ or something else?how to get U from action?