Closed ghost closed 2 years ago
Hello @WarmHouse ,
The continuous
here refers to the observation space, in that agents can move around smoothly on a 2D plane, as against the tag-gridworld where they can only move from one grid point to another. The actions (in this environment, acceleration and turn actions) are still discrete, and the agent's movements are dictated by equations of motion with these actions.
The sample training configuration (https://github.com/salesforce/warp-drive/blob/master/warp_drive/training/run_configs/run_config_tag_continuous.yaml) defaults to the "A2C" algorithm. For running with ppo, you can change the algorithm value to "PPO", and relaunch the training script.
Hope this helps. Thanks!
I thought PPO algorithm is used for continuous action space scenario before. Am I right?
Could it used for tag-continous
which is a discrete action space environment.
PPO can be used for both types of action spaces. Please see https://spinningup.openai.com/en/latest/algorithms/ppo.html for a review
I think the idea of environment scheduling is very novel. Multi-environment and multi-agent are scheduled on GPU, which improves GPU utilization ratio. I have some questions about the
tag-continuous
:continuous
represent continuous action space? As I saw that actually the action space oftag-continuous
is discrete.