jc-bao / policy-adaptation-survey

This repository is for comparing the prevailing adaptive control method in both control and learning communities.
Apache License 2.0
7 stars 1 forks source link

🚁 Policy Adaptation Survey

πŸ€– Environment

Quadrotor transportation Cartpole Hover Brax html
quadrotor cartpole hover image

🐣 Train

Torch-based environemnt

cd adaptive_control_gym/controller/rl
# Run the train function with the parsed arguments
python train.py \
    --use_wandb $USE_WANDB \
    --program $PROGRAM \
    --seed $SEED \
    --gpu_id $GPU_ID \
    --act_expert_mode $ACT_EXPERT_MODE \
    --cri_expert_mode $CRI_EXPERT_MODE \
    --exp_name $EXP_NAME \
    --compressor_dim $COMPRESSOR_DIM \
    --search_dim $SEARCH_DIM \
    --res_dyn_param_dim $RES_DYN_PARAM_DIM \
    --task $TASK \
    --resume_path $RESUME_PATH \
    --drone_num $DRONE_NUM \
    --env_num $ENV_NUM \
    --total_steps $TOTAL_STEPS \
    --adapt_steps $ADAPT_STEPS \
    --curri_thereshold $CURRI_THERESHOLD

Brax-based environemnt

cd adaptive_control_gym/envs/brax
python train_brax.py
python play_brax.py --policy-type ppo --policy-path '../results/params' # visualize

Examples

# train with RMA
python train.py --exp-name "TrackRMA"  --task track --act-expert-mode 1 --cri-expert-mode 1 --use-wandb  --gpu-id 0
# train Robust policy
python train.py --exp-name "TrackRobust"  --task track --use-wandb --gpu-id 0

πŸ•Ή Play with environment

# go to environment folder
cd adaptive_control_gym/envs

# run environment
python quadtrans.py 
    --policy_type pid  # 'random', 'pid'
    --task "avoid"  # 'track', 'hover', 'avoid'
    --policy_path 'ppo.pt' # for PPO only
    --seed 0
    --env_num 1
    --drone_num 1
    --gpu_id -1 # use CPU
    --enable_log  true # log parameter to csv and plot
    --enable_vis  true # use meshcat to visualize
    --curri_param 1.0 # 0.0 for simple case
task=aviod curri_param=1.0 task=aviod curri_param=0.0 task=track task=hover
image image image image

πŸ’ Policy