π Policy Adaptation Survey
π€ Environment
Quadrotor transportation |
Cartpole |
Hover |
Brax html |
|
|
|
|
π£ Train
Torch-based environemnt
cd adaptive_control_gym/controller/rl
# Run the train function with the parsed arguments
python train.py \
--use_wandb $USE_WANDB \
--program $PROGRAM \
--seed $SEED \
--gpu_id $GPU_ID \
--act_expert_mode $ACT_EXPERT_MODE \
--cri_expert_mode $CRI_EXPERT_MODE \
--exp_name $EXP_NAME \
--compressor_dim $COMPRESSOR_DIM \
--search_dim $SEARCH_DIM \
--res_dyn_param_dim $RES_DYN_PARAM_DIM \
--task $TASK \
--resume_path $RESUME_PATH \
--drone_num $DRONE_NUM \
--env_num $ENV_NUM \
--total_steps $TOTAL_STEPS \
--adapt_steps $ADAPT_STEPS \
--curri_thereshold $CURRI_THERESHOLD
- use_wandb: A boolean flag indicating whether to use the Weights & Biases service for logging and visualization. Default is False.
- program: A string specifying the name of the program. Default is 'tmp'.
- seed: An integer specifying the random seed to use. Default is 1.
- gpu_id: An integer specifying the ID of the GPU to use. Default is 0.
- act_expert_mode: An integer specifying the expert mode for the actor network. Default is 0.
- cri_expert_mode: An integer specifying the expert mode for the critic network. Default is 0.
- exp_name: A string specifying the name of the experiment. Default is an empty string.
- compressor_dim: An integer specifying the dimension of the compressor network. Default is 4.
- search_dim: An integer specifying the dimension of the search network. Default is 0.
- res_dyn_param_dim: An integer specifying the dimension of the residual dynamic parameter network. Default is 0.
- task: A string specifying the task to perform. Can be 'track', 'hover', or 'avoid'. Default is 'track'.
- resume_path: A string specifying the path to a saved checkpoint to resume training from. Default is None.
- drone_num: An integer specifying the number of drones to use. Default is 1.
- env_num: An integer specifying the number of environments to use. Default is 16384.
- total_steps: An integer specifying the total number of training steps to perform. Default is 8e7.
- adapt_steps: An integer specifying the number of adaptation steps to perform. Default is 5e6.
- curri_thereshold: A float specifying the curriculum threshold. Default is 0.2.
Brax-based environemnt
cd adaptive_control_gym/envs/brax
python train_brax.py
python play_brax.py --policy-type ppo --policy-path '../results/params' # visualize
Examples
# train with RMA
python train.py --exp-name "TrackRMA" --task track --act-expert-mode 1 --cri-expert-mode 1 --use-wandb --gpu-id 0
# train Robust policy
python train.py --exp-name "TrackRobust" --task track --use-wandb --gpu-id 0
πΉ Play with environment
# go to environment folder
cd adaptive_control_gym/envs
# run environment
python quadtrans.py
--policy_type pid # 'random', 'pid'
--task "avoid" # 'track', 'hover', 'avoid'
--policy_path 'ppo.pt' # for PPO only
--seed 0
--env_num 1
--drone_num 1
--gpu_id -1 # use CPU
--enable_log true # log parameter to csv and plot
--enable_vis true # use meshcat to visualize
--curri_param 1.0 # 0.0 for simple case
task=aviod curri_param=1.0 |
task=aviod curri_param=0.0 |
task=track |
task=hover |
|
|
|
|
π Policy