Closed navneet-nmk closed 4 years ago
Hi! Sorry for the late reply.
In Tensorboard, the entry before_update
corresponds to the average return before the fast adaptation (number of gradient steps=0 in Figure 5 of the original paper). after_update
corresponds to the average return after the fast adaptation (number of gradient steps=1 in Figure 5).
Also a note: at the moment, the number of gradient updates (for fast adaptation) is fixed in the code: you can only adapt with a single gradient step. In the future, I hope I could make it more flexible.
The new version of the code now includes an option to have multiple gradient steps for adaptation. You can access the corresponding samples for each gradient step; following test.py
train_episodes
is a list whose length is the number of gradient steps, and each element is also a list of length meta_batch_size
containing the different episodes. For example, train_episodes[0]
contains the episodes before any gradient update, train_episodes[1]
the episodes after 1 gradient update (if the number of steps of adaptation is > 1), and so on.valid_episodes
is a list containing the episodes after all the steps of adaptation.You can use the get_returns
function to get the returns for the different episodes.
I am confused about the before and after update rewards on tensorboard.
# Tensorboard writer.add_scalar('total_rewards/before_update', total_rewards([ep.rewards for ep, _ in episodes]), batch) writer.add_scalar('total_rewards/after_update', total_rewards([ep.rewards for _, ep in episodes]), batch)
I mean, I wanted to understand how to train the model on a new environment in 2 or 3 gradient steps and then check the reward. Is this what the after_update rewards refer to ?