Use of custom environment and agents

Adaickalavan commented 3 years ago

I am interested in using XingTian for multi-agent training with PPO algorithm in the SMARTS environment. An example to use SMARTS environment is available here.

Could you provide a detailed step-by-step instructions and an example on how to use XingTian with our own custom environment for multi-agent training?

hustqj commented 3 years ago

I will upload a multi-agent PPO example, You can refer to it .

hustqj commented 3 years ago

I have added new examples, you can find it in xingtian/examples/ma_cases/ppo_share_catch_pigs.yaml.

Adaickalavan commented 3 years ago

I have several questions as follows:

[1] Could you explain the differences between the setting self.env_info["api_type"] == "standalone" and self.env_info["api_type"] == "unified" using an example? When do we use each of them?

[2] I tried using a custom environment and a custom agent in XingTian. There were 2 agents (i.e., multiagent) training with 1 environment and 1 learner. The custom environment accepts agent actions of dict format {“0”: Action_of_agent_0, “1”: Action_of_agent_1} and returns a dict format {“0”: Observation_of_agent_0, “1”: Observation_of_agent_1} on reset and on each step. The custom agent implements infer_action function which accepts input raw_state of format Observation_of_agent_x and returns action of format Action_of_agent_x.

[2a] When the training was run with api_type==unified, the following error message was printed:

From the error trace, this error appears to be because the code sidestepped the Agent block and directly proceeded to the Algorithm block. The code feeds the states directly to self.algs[0].predict(states) from Environment block.
Consider the line self.algs[0].predict(states) and assume we want the two agents to use different algorithms. How can we achieve it in api_type==unified since self.algs is of length 1?

[2b] On the other hand, when the training was run with api_type==standalone, the following error message was printed:

From the error trace, this error appears to be because a dictionary of all agents' state (i.e., {“0”: Observation_of_agent_0, “1”: Observation_of_agent_1}) is being fed to the infer_action function. However, infer_action function accepts input raw_state for a single agent at each time of format Observation_of_agent_x.

[2c] What should I do to achieve two agents (i.e., multiagent) training with N environments and M learners, with the above custom environment and custom agent interfaces?

[3] Refer to this portion of the code. https://github.com/huawei-noah/xingtian/blob/9dee512245f777c20c65b8e198fde95ab61aa507/xt/framework/agent_group.py#L436-L462 Assume we are training 2 agents (i.e., multiagent) with 1 environment and 1 learner. When using api_type==standalone, each agent appears to be executed in the same environment instance for one full episode using separate threads via self.bot.do_multi_job(job_funcs, _paras).

So, are the agents stepped independently at different speeds in the same environment instance?
In other words, each agent is not guaranteed to make one step together at each time point?

hustqj commented 3 years ago

[1] "standalone" means the simulator provides an independent interface for each agent, "unified" means all agents share one interface like smarts

hustqj commented 3 years ago

[2a] You should convert oberservation to numpy type in the your agent module. [2c] You can set env_num=N to achieve interacting with N environments, but we only support one learner now， all training data from N environments will be sent to learner for training

hustqj commented 3 years ago

[3] In the "standalone" mode, each agent is running in independent thread, whether they run synchronously depends on the environment， some environment will guarantee all agent running in the same time piont and some environments are completely asynchronous

huawei-noah / xingtian

Use of custom environment and agents #10