Interpretation of before and after update

tristandeleu / pytorch-maml-rl

Reinforcement Learning with Model-Agnostic Meta-Learning in Pytorch

MIT License

827 stars 158 forks source link

Interpretation of before and after update #26

Closed navneet-nmk closed 4 years ago

navneet-nmk commented 5 years ago

I am confused about the before and after update rewards on tensorboard.

# Tensorboard writer.add_scalar('total_rewards/before_update', total_rewards([ep.rewards for ep, _ in episodes]), batch) writer.add_scalar('total_rewards/after_update', total_rewards([ep.rewards for _, ep in episodes]), batch)

I mean, I wanted to understand how to train the model on a new environment in 2 or 3 gradient steps and then check the reward. Is this what the after_update rewards refer to ?

tristandeleu commented 5 years ago

Hi! Sorry for the late reply.

In Tensorboard, the entry before_update corresponds to the average return before the fast adaptation (number of gradient steps=0 in Figure 5 of the original paper). after_update corresponds to the average return after the fast adaptation (number of gradient steps=1 in Figure 5).

Also a note: at the moment, the number of gradient updates (for fast adaptation) is fixed in the code: you can only adapt with a single gradient step. In the future, I hope I could make it more flexible.

tristandeleu commented 4 years ago

The new version of the code now includes an option to have multiple gradient steps for adaptation. You can access the corresponding samples for each gradient step; following test.py

train_episodes is a list whose length is the number of gradient steps, and each element is also a list of length meta_batch_size containing the different episodes. For example, train_episodes[0] contains the episodes before any gradient update, train_episodes[1] the episodes after 1 gradient update (if the number of steps of adaptation is > 1), and so on.
valid_episodes is a list containing the episodes after all the steps of adaptation.

You can use the get_returns function to get the returns for the different episodes.