Open Adaickalavan opened 3 years ago
I will upload a multi-agent PPO example, You can refer to it .
I have added new examples, you can find it in xingtian/examples/ma_cases/ppo_share_catch_pigs.yaml.
I have several questions as follows:
[1] Could you explain the differences between the setting self.env_info["api_type"] == "standalone"
and self.env_info["api_type"] == "unified"
using an example? When do we use each of them?
[2] I tried using a custom environment and a custom agent in XingTian. There were 2 agents (i.e., multiagent) training with 1 environment and 1 learner. The custom environment accepts agent actions of dict
format {“0”: Action_of_agent_0, “1”: Action_of_agent_1}
and returns a dict
format {“0”: Observation_of_agent_0, “1”: Observation_of_agent_1} on reset and on each step. The custom agent implements infer_action
function which accepts input raw_state
of format Observation_of_agent_x
and returns action
of format Action_of_agent_x
.
[2a] When the training was run with api_type==unified
, the following error message was printed:
Agent
block and directly proceeded to the Algorithm
block. The code feeds the states
directly to self.algs[0].predict(states)
from Environment
block.self.algs[0].predict(states)
and assume we want the two agents to use different algorithms. How can we achieve it in api_type==unified
since self.algs
is of length 1? [2b] On the other hand, when the training was run with api_type==standalone
, the following error message was printed:
{“0”: Observation_of_agent_0, “1”: Observation_of_agent_1}
) is being fed to the infer_action
function. However, infer_action
function accepts input raw_state
for a single agent at each time of format Observation_of_agent_x
.[2c] What should I do to achieve two agents (i.e., multiagent) training with N environments and M learners, with the above custom environment and custom agent interfaces?
[3] Refer to this portion of the code.
https://github.com/huawei-noah/xingtian/blob/9dee512245f777c20c65b8e198fde95ab61aa507/xt/framework/agent_group.py#L436-L462
Assume we are training 2 agents (i.e., multiagent) with 1 environment and 1 learner. When using api_type==standalone
, each agent appears to be executed in the same environment instance for one full episode using separate threads via self.bot.do_multi_job(job_funcs, _paras)
.
[1] "standalone" means the simulator provides an independent interface for each agent, "unified" means all agents share one interface like smarts
[2a] You should convert oberservation to numpy type in the your agent module. [2c] You can set env_num=N to achieve interacting with N environments, but we only support one learner now, all training data from N environments will be sent to learner for training
[3] In the "standalone" mode, each agent is running in independent thread, whether they run synchronously depends on the environment, some environment will guarantee all agent running in the same time piont and some environments are completely asynchronous
I am interested in using XingTian for multi-agent training with PPO algorithm in the SMARTS environment. An example to use SMARTS environment is available here.
Could you provide a detailed step-by-step instructions and an example on how to use XingTian with our own custom environment for multi-agent training?