shamilmamedov / flexible_arm

3 stars 0 forks source link

Safe Imitation Learning of Nonlinear Model Predictive Control for Flexible Robots

Method

Image Description

Nonlinear MPC performance

https://github.com/shamilmamedov/flexible_arm/assets/59015432/d1e1a90e-e378-4fa5-a520-d8d2ca81f6c2

The proposed method vs NMPC

https://github.com/shamilmamedov/flexible_arm/assets/59015432/b3b2f49b-bc46-4bc9-8e92-94a316daca1a

Installation of acados:

Installation of acados according to the following instructions: https://docs.acados.org/python_interface/index.html

Imitation Library Fork (submodule):

Current (21 August 2023) version on imitation library does not yet support Gymnasium. So we are using our own fork of it with necessary modifications.

After cloning this repo:

Hyperparameters of the IL, RL and IRL algorithms

Hyper-parameter Value
COMMON: Learning Rate 0.0003
COMMON: Number of Expert Demos 100
COMMON: Number of Training Steps 2,000,000
PPO: Net. Arch. pi:[256, 256] vf:[256, 256]
PPO: Batch Size 64
SAC: Net. Arch. pi:[256, 256] qf:[256, 256]
SAC: Batch Size 256
BC: Net. Arch. pi:[32, 32] qf:[32, 32]
BC: Batch Size 32
DAgger: Online Episodes 500
Density: Kernel type Gaussian
Density: Kernel bandwidth 0.5
Density: Net. Arch. pi:[256, 256] qf:[256, 256]
GAIL: Reward Net Arch. [32, 32]
GAIL: Policy Net Arch. pi:[256, 256] qf:[256, 256]
GAIL: Policy Replay Buffer Capacity 512
GAIL: Batch Size 128
AIRL: Reward Net Arch. [32, 32]
AIRL: Policy Net Arch. pi:[256, 256] qf:[256, 256]
AIRL: Batch Size 128
AIRL: Policy Replay Buffer Capacity 512

NMPC parameters

Parameter Value
Hessian Approximation Gauss-Newton
SQP type real-time iterations
$\Delta t$, $N$, $n_\mathrm{seg}$ $5$ ms, 125, 3
$Q$ weights $w_{qa}$, $\dot w{qa}$, $w{qp}$, $\dot{w}{q_p}$ $0.01 \; 0.1 \; 0.01 \; 10$
$P_N$ diag($[1,1,1,0,0,0])\cdot 10^4$
$P$ diag($[1,1,1,0,0,0])\cdot 2\cdot10^3$
$R$ diag($[1,10,10]$)
$S$, $s$ diag($[1,1,1]\cdot 10^6$), $[1,1,1]^\top\cdot 10^4$
$\delta\mathrm{ee}, \delta\mathrm{elb}$ , $\delta_\mathrm{x}$ $0.01\mathrm{m}, \;0.005\mathrm{m}$, \; $0\cdot 1_{n_x}$
$\overline{\dot{q_a}}=-\underline{\dot{q_a}}$ $[2.5, 3.5, 3.5]^\top\;s^{-1}$
$\overline{u}=-\underline{u}$ $[20,10,10]^\top$ Nm

Safety Filter parameters

Parameter Value
$\Delta t\mathrm{SF}$, $N\mathrm{SF}$, $n_\mathrm{seg}$ $10$ ms, $25$, $1$
$\bar{R}$ diag($[1,1,1]$)
${R}_\mathrm{SF}$ diag($[1,1,1]$) $\cdot 10^{-5}$