Why not include data collected with z from the posterior in the encoder buffer?

katerakelly / oyster

Implementation of Efficient Off-policy Meta-learning via Probabilistic Context Variables (PEARL)

MIT License

472 stars 125 forks source link

Why not include data collected with z from the posterior in the encoder buffer? #8

Closed xlnwel closed 5 years ago

xlnwel commented 5 years ago

Thanks for your great work and code! I spot that at the training time of most environments, only data collected with z from the prior is added to the encoder buffer --- num_steps_posterior is set to zero for these environments. What's the reasoning behind this decision making? Why not include data collected with z from the posterior in the encoder buffer?

katerakelly commented 5 years ago

We found this setting worked better for these shaped reward environments, in which exploration doesn't seem to be crucial for identifying and solving the task.

xlnwel commented 5 years ago

Thanks for addressing my concerns :-)