katerakelly / oyster

Implementation of Efficient Off-policy Meta-learning via Probabilistic Context Variables (PEARL)
MIT License
472 stars 125 forks source link

Run training environments at the evaluation time? #7

Closed xlnwel closed 5 years ago

xlnwel commented 5 years ago

Hi, thanks for your great work. But it seems that you run the training environments at the evaluation time: https://github.com/katerakelly/oyster/blob/cd09c1ae0e69537ca83004ca569574ea80cf3b9c/rlkit/core/rl_algorithm.py#L413

azhou42 commented 5 years ago

That is correct. That section of the code evaluates train tasks to see how well the policy performs given the embeddings it is being optimized for during meta-training (rather than embeddings computed from freshly gathered exploratory data as in the actual test setting).

The evaluation of test tasks occurs here: https://github.com/katerakelly/oyster/blob/cd09c1ae0e69537ca83004ca569574ea80cf3b9c/rlkit/core/rl_algorithm.py#L444

xlnwel commented 5 years ago

Thanks. My obliviousness.