nagataka / Read-a-Paper

Survey
6 stars 1 forks source link

Learning Latent Dynamics for Planning from Pixels #32

Open nagataka opened 4 years ago

nagataka commented 4 years ago

Summary

Link

Learning Latent Dynamics for Planning from Pixels

Official repo: google-research/planet

Author/Institution

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson Google Brain, University of Toronto, DeepMind, Google Research, University of Michigan

What is this

"Model" in this architecture refers three thigs:

and policy $p(a_t|o_t,a_t)$ aimes to maximize the expected sum of rewards.

Screenshot from 2020-05-04 18-04-20

Comparison with previous researches. What are the novelties/good points?

Key points

Regarding recurrent network for planning, they claim the following:

our experiments show that both stochastic and deterministic paths in the transition model are crucial for successful planning

and the network architecture looks like Figure2 (c) which is called Recurrent state-space model (RSSM) Screenshot from 2020-05-04 18-05-49

How the author proved the effectiveness of the proposal?

Experiments in continuous control tasks: Cartpole Swing Up, Reacher Easy, Cheetah Run, Finger Spin, Cup Catch, and Walker Walk from DeepMind control suite

Confirmed that the proposed model achieved comparable performance to the best model-free algorithms while using 200× fewer episodes and similar or less computation time.

Any discussions?

What should I read next?

Broader contextual review: