"We propose a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation."
"Model-based reinforcement learning (RL) algorithms can attain excellent sample efficiency, but often lag behind the best model-free algorithms in terms of asymptotic performance. "
"Our comparison to state-of-the-art model-based and model-free deep RL algorithms shows that our approach matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples (e.g., 8 and 125 times fewer samples than Soft Actor Critic and Proximal Policy Optimization respectively on the half-cheetah task)."
Comparison with previous researches. What are the novelties/good points?
"While a number of prior works have explored uncertainty-aware deep neural network models [Neal, 1995, Lakshminarayanan et al., 2017], including in the context of RL [Gal et al., 2016, Depeweg et al., 2016], our work is, to our knowledge, the first to bring these components together in a deep MBRL framework that reaches the asymptotic performance of state-of-the-art model-free RL methods on benchmark control tasks."
"these components" == ensembling and outputting Gaussian distribution parameters
Key points
Two types of uncertainty:
Aleatoric uncertainty
Arises from inherent stochasticities of a system (e.g. observation noise and process noise)
Aleatoric uncertainty can be captured by outputting the parameters of a parameterized distribution
Epistemic uncertainty
corresponds to subjective uncertainty about the dynamics function, due to a lack of sufficient data to uniquely determine the underlying system exactly.
In the limit of infinite data, epistemic uncertainty should vanish
How the author proved effectiveness of the proposal?
Comparison to state-of-the-art model-based and model-free deep RL algorithms
Showed that "our approach matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples (e.g., 8 and 125 times fewer samples than Soft Actor Critic and Proximal Policy Optimization respectively on the half-cheetah task)."
Summary
Link
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models Official Code
Author/Institution
Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine UC Berkeley
What is this
Comparison with previous researches. What are the novelties/good points?
Key points
Two types of uncertainty:
Planning and control
Algorithm
How the author proved effectiveness of the proposal?
Any discussions?
What should I read next?