salesforce / warp-drive

Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)
BSD 3-Clause "New" or "Revised" License
465 stars 78 forks source link

Some question about the environment #10

Closed ghost closed 2 years ago

ghost commented 3 years ago

I think the idea of environment scheduling is very novel. Multi-environment and multi-agent are scheduled on GPU, which improves GPU utilization ratio. I have some questions about the tag-continuous:

sunil-s commented 3 years ago

Hello @WarmHouse , The continuous here refers to the observation space, in that agents can move around smoothly on a 2D plane, as against the tag-gridworld where they can only move from one grid point to another. The actions (in this environment, acceleration and turn actions) are still discrete, and the agent's movements are dictated by equations of motion with these actions.

The sample training configuration (https://github.com/salesforce/warp-drive/blob/master/warp_drive/training/run_configs/run_config_tag_continuous.yaml) defaults to the "A2C" algorithm. For running with ppo, you can change the algorithm value to "PPO", and relaunch the training script.

Hope this helps. Thanks!

ghost commented 3 years ago

I thought PPO algorithm is used for continuous action space scenario before. Am I right? Could it used for tag-continous which is a discrete action space environment.

sunil-s commented 3 years ago

PPO can be used for both types of action spaces. Please see https://spinningup.openai.com/en/latest/algorithms/ppo.html for a review