I did not quite understand the tasks and meta mechanism used in the paper, with the replies of the author, I got a clearer view. The conclusion and discussion will be listed below to help more people.
In the pseudocode, the rollouts are divided into two-parts, does this equal to treat two-parts rollouts as two tasks in meta-learning? If it is, what the intuition behind this treatment?
A: In meta-learning terminology, fitting the baseline for each external input process would be our task. What we want is to quickly adapt the baseline fitting to a new input sequence, by seeing the input sequence only a few times. Notice that the two-parts rollouts are experience the same input sequence, hence it’s the same “task”. The intuition for this treatment is we want the baseline to adapt based on one part of the experience (meta gradient step) so that it can fit second part quickly (fine-tune gradient step). We swap the two parts during training just to create more data.
I think the multi-baseline model (in appendix I) is easier to understand. The idea for meta baseline really extends from there.
From my understanding, the baseline should generalize across different inputs, thus it seems more reasonable by treat different inputs as different tasks, and sample a batch of inputs during training.
You are exactly right here! We are treating different inputs as different tasks. For simplicity, our paper only samples the same inputs at each training step and we rely on stochastic gradient descent to converge the model for different tasks.
Intuitively, the performance should be better if we sample and mix different tasks at each batch.
I did not quite understand the tasks and meta mechanism used in the paper, with the replies of the author, I got a clearer view. The conclusion and discussion will be listed below to help more people.