Some question about the environment

ghost commented 3 years ago

I think the idea of environment scheduling is very novel. Multi-environment and multi-agent are scheduled on GPU, which improves GPU utilization ratio. I have some questions about the tag-continuous:

Does continuous represent continuous action space? As I saw that actually the action space of tag-continuous is discrete.
Is there any example about ppo algorithm with tag or gridworld?

sunil-s commented 3 years ago

Hello @WarmHouse , The continuous here refers to the observation space, in that agents can move around smoothly on a 2D plane, as against the tag-gridworld where they can only move from one grid point to another. The actions (in this environment, acceleration and turn actions) are still discrete, and the agent's movements are dictated by equations of motion with these actions.

The sample training configuration (https://github.com/salesforce/warp-drive/blob/master/warp_drive/training/run_configs/run_config_tag_continuous.yaml) defaults to the "A2C" algorithm. For running with ppo, you can change the algorithm value to "PPO", and relaunch the training script.

Hope this helps. Thanks!

ghost commented 3 years ago

I thought PPO algorithm is used for continuous action space scenario before. Am I right? Could it used for tag-continous which is a discrete action space environment.

sunil-s commented 3 years ago

PPO can be used for both types of action spaces. Please see https://spinningup.openai.com/en/latest/algorithms/ppo.html for a review

salesforce / warp-drive

Some question about the environment #10