Learning from demonstration, also known as "programming by demonstration" , "imitation learning", and "teaching by showing". The goal was to replace the time-consuming manual programming of a robot by an antomatic programming process, solwy drien by showing the robot the assembly task by an expert.
This paper investigates how learning from demonstration can be applied in the context of reinforcement learning.
In an implementation of pole balancing on a complex anthropomorphic robot arm, it demonstrates that, when facing the complexities of real signal processing, model-based RL offers the most robustness for LQR problems.
Two examples:
For both tasks, the learner is given information about the one-step reward r (Fig.1), and both tasks are formulated as continuous state and continuous action problem. The goal of each task is to find a policy which minimized the infinite horizon discounted reward:
where the left hand equation is the continuous time formulation, while the right hand equation is the corresponding discrete time version.
Learning from demonstration, also known as "programming by demonstration" , "imitation learning", and "teaching by showing". The goal was to replace the time-consuming manual programming of a robot by an antomatic programming process, solwy drien by showing the robot the assembly task by an expert. This paper investigates how learning from demonstration can be applied in the context of reinforcement learning. In an implementation of pole balancing on a complex anthropomorphic robot arm, it demonstrates that, when facing the complexities of real signal processing, model-based RL offers the most robustness for LQR problems. Two examples:![screenshot from 2018-12-10 17-15-21](https://user-images.githubusercontent.com/11659104/49745455-460c2200-fc9f-11e8-858a-c92ea40e065b.png)
The nonlinear task: swing-up 1.1 V-learning (model unknown)
1.2 Model-based V-learning
![image](https://user-images.githubusercontent.com/11659104/49746912-88832e00-fca2-11e8-9763-f3e3993956d6.png)
The linear task: cart-pole balancing 2.1 Q-learning
2.2 model-based V-learning