tristandeleu / pytorch-maml-rl

Reinforcement Learning with Model-Agnostic Meta-Learning in Pytorch
MIT License
827 stars 158 forks source link

Questions about the output files #43

Closed shiqichen17 closed 4 years ago

shiqichen17 commented 4 years ago

This question may be a bit silly but I cannot figure out what the output files mean and how to draw the figures on your paper. The result files consist of three files: tasks, train_returns, valid_returns. To gain the average returns, should I calculate the mean value of the "valid_returns"? What about the returns before update? Is it calculated by average the "train_returns"? Thank you so much if you can provide any help.

tristandeleu commented 4 years ago

The array valid_returns contains the returns for the episodes sampled after adaptation. This array has size (meta_batch_size, fast_batch_size). Taking the average and standard error over the whole valid_returns array will give you numbers that correspond to the value for 1 gradient step (assuming you are doing 1 gradient step adaptation) from Figure 5 in the MAML paper. If you take the average and standard error over the second axis only (valid_returns.mean(1)), you can get the orange curve in Figure 1 in the Negative adaptation in MAML paper (and use the tasks list to know which task the values correspond to).

The array train_returns is an array with the same shape, containing the returns before any adaptation step. With the same statistics as above, it corresponds to 0 gradient steps in Figure 5 of the MAML paper and the blue curve in Figure 1 of the Negative adaptation in MAML paper.

shiqichen17 commented 4 years ago

Thanks a lot for your kind reply! That's really helpful. While I still have one more question to ask. In the python file maml_trpo, what is the role of the function " async def adapt(self, train_futures, first_order=None)" ? Cause I thought the adaptation process happens in the " multi_task sampler" file. Thanks again for your time and patience.

tristandeleu commented 4 years ago

There is indeed adaptation in two different places:

So it looks like this is wasteful to do the adaptation twice (once for sampling episodes, another for optimization), but this allows complete decoupling of sampling and optimization, and makes the overall process significantly faster. An earlier version of this repo which had sampling and optimization entangled was 10x slower.

shiqichen17 commented 4 years ago

Got it, thank you so much!!!