huawei-noah / xingtian

xingtian is a componentized library for the development and verification of reinforcement learning algorithms
MIT License
305 stars 89 forks source link

Use of custom environment and agents #10

Open Adaickalavan opened 3 years ago

Adaickalavan commented 3 years ago

I am interested in using XingTian for multi-agent training with PPO algorithm in the SMARTS environment. An example to use SMARTS environment is available here.

Could you provide a detailed step-by-step instructions and an example on how to use XingTian with our own custom environment for multi-agent training?

hustqj commented 3 years ago

I will upload a multi-agent PPO example, You can refer to it .

hustqj commented 3 years ago

I have added new examples, you can find it in xingtian/examples/ma_cases/ppo_share_catch_pigs.yaml.

Adaickalavan commented 3 years ago

I have several questions as follows:

[1] Could you explain the differences between the setting self.env_info["api_type"] == "standalone" and self.env_info["api_type"] == "unified" using an example? When do we use each of them?

[2] I tried using a custom environment and a custom agent in XingTian. There were 2 agents (i.e., multiagent) training with 1 environment and 1 learner. The custom environment accepts agent actions of dict format {“0”: Action_of_agent_0, “1”: Action_of_agent_1} and returns a dict format {“0”: Observation_of_agent_0, “1”: Observation_of_agent_1} on reset and on each step. The custom agent implements infer_action function which accepts input raw_state of format Observation_of_agent_x and returns action of format Action_of_agent_x.

[2a] When the training was run with api_type==unified, the following error message was printed:

[2b] On the other hand, when the training was run with api_type==standalone, the following error message was printed:

[2c] What should I do to achieve two agents (i.e., multiagent) training with N environments and M learners, with the above custom environment and custom agent interfaces?

[3] Refer to this portion of the code. https://github.com/huawei-noah/xingtian/blob/9dee512245f777c20c65b8e198fde95ab61aa507/xt/framework/agent_group.py#L436-L462 Assume we are training 2 agents (i.e., multiagent) with 1 environment and 1 learner. When using api_type==standalone, each agent appears to be executed in the same environment instance for one full episode using separate threads via self.bot.do_multi_job(job_funcs, _paras).

hustqj commented 3 years ago

[1] "standalone" means the simulator provides an independent interface for each agent, "unified" means all agents share one interface like smarts

hustqj commented 3 years ago

[2a] You should convert oberservation to numpy type in the your agent module. [2c] You can set env_num=N to achieve interacting with N environments, but we only support one learner now, all training data from N environments will be sent to learner for training

hustqj commented 3 years ago

[3] In the "standalone" mode, each agent is running in independent thread, whether they run synchronously depends on the environment, some environment will guarantee all agent running in the same time piont and some environments are completely asynchronous