What is the input form of the action in the‘simple_reference’?

openai / multiagent-particle-envs

Code for a multi-agent particle environment used in the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"

https://arxiv.org/pdf/1706.02275.pdf

MIT License

2.33k stars 785 forks source link

What is the input form of the action in the‘simple_reference’? #76

Closed wyt2019suzhou closed 4 years ago

wyt2019suzhou commented 4 years ago

What is the input form of the action in the‘simple_reference’?and what do they mean‘left，right....’ or something else？how to get U from action?

wyt2019suzhou commented 4 years ago

`from make_env import make_env import numpy as np env = make_env('simple_reference') for i_episode in range(2):

observation = env.reset()
for t in range(10):
    agent_actions = []
    for i, agent in enumerate(env.world.agents):
        agent_action_space = env.action_space[i]
        action = agent_action_space.sample()
        agent_actions.append(action)
    observation, reward, done, info = env.step(agent_actions)
    print('u', agent.action.u)
    print(agent_actions)
    print(observation)
    print(reward)
    print(done)
    print(info)`

I tried this code, but it reported an error

HanbumKo commented 4 years ago

I was also struggling with that, but I finally got it. So it has to be like

act_0 = [0., 0.5, 0.6, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.] act_1 = [0., 0.5, 0.6, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.] env.step([act_0, act_1])

first 5 numbers mean 5 actions and the environment select maximum value action, following 10 numbers vector is a message vector which is added on next observation. You can print it out and see.