Safe Imitation Learning of Nonlinear Model Predictive Control for Flexible Robots

Method

Nonlinear MPC performance

https://github.com/shamilmamedov/flexible_arm/assets/59015432/d1e1a90e-e378-4fa5-a520-d8d2ca81f6c2

The proposed method vs NMPC

https://github.com/shamilmamedov/flexible_arm/assets/59015432/b3b2f49b-bc46-4bc9-8e92-94a316daca1a

Installation of acados:

Installation of acados according to the following instructions: https://docs.acados.org/python_interface/index.html

Imitation Library Fork (submodule):

Current (21 August 2023) version on imitation library does not yet support Gymnasium. So we are using our own fork of it with necessary modifications.

After cloning this repo:

git submodule init
git submodule update
cd imitation
pip install -e .

Hyperparameters of the IL, RL and IRL algorithms

Hyper-parameter	Value
COMMON: Learning Rate	0.0003
COMMON: Number of Expert Demos	100
COMMON: Number of Training Steps	2,000,000
PPO: Net. Arch.	pi:[256, 256] vf:[256, 256]
PPO: Batch Size	64
SAC: Net. Arch.	pi:[256, 256] qf:[256, 256]
SAC: Batch Size	256
BC: Net. Arch.	pi:[32, 32] qf:[32, 32]
BC: Batch Size	32
DAgger: Online Episodes	500
Density: Kernel type	Gaussian
Density: Kernel bandwidth	0.5
Density: Net. Arch.	pi:[256, 256] qf:[256, 256]
GAIL: Reward Net Arch.	[32, 32]
GAIL: Policy Net Arch.	pi:[256, 256] qf:[256, 256]
GAIL: Policy Replay Buffer Capacity	512
GAIL: Batch Size	128
AIRL: Reward Net Arch.	[32, 32]
AIRL: Policy Net Arch.	pi:[256, 256] qf:[256, 256]
AIRL: Batch Size	128
AIRL: Policy Replay Buffer Capacity	512

NMPC parameters

Parameter	Value
Hessian Approximation	Gauss-Newton
SQP type	real-time iterations
$\Delta t$, $N$, $n_\mathrm{seg}$	$5$ ms, 125, 3
$Q$ weights $w_{qa}$, $\dot w{qa}$, $w{qp}$, $\dot{w}{q_p}$	$0.01 \; 0.1 \; 0.01 \; 10$
$P_N$	diag($[1,1,1,0,0,0])\cdot 10^4$
$P$	diag($[1,1,1,0,0,0])\cdot 2\cdot10^3$
$R$	diag($[1,10,10]$)
$S$, $s$	diag($[1,1,1]\cdot 10^6$), $[1,1,1]^\top\cdot 10^4$
$\delta\mathrm{ee}, \delta\mathrm{elb}$ , $\delta_\mathrm{x}$	$0.01\mathrm{m}, \;0.005\mathrm{m}$, \; $0\cdot 1_{n_x}$
$\overline{\dot{q_a}}=-\underline{\dot{q_a}}$	$[2.5, 3.5, 3.5]^\top\;s^{-1}$
$\overline{u}=-\underline{u}$	$[20,10,10]^\top$ Nm

Safety Filter parameters

Parameter	Value
$\Delta t\mathrm{SF}$, $N\mathrm{SF}$, $n_\mathrm{seg}$	$10$ ms, $25$, $1$
$\bar{R}$	diag($[1,1,1]$)
${R}_\mathrm{SF}$	diag($[1,1,1]$) $\cdot 10^{-5}$

shamilmamedov / flexible_arm

readme