Closed mrchizx closed 4 years ago
Thank you for your interest in our paper. You are absolutely right, we train three different models for different K. We will include that later in our complete code. And yes, the K input/output pairs should be matched during meta-training and testing.
And for the second question, there is a difference between inner update and outer update (please refer to fig 2 in the paper for further details). For the outer update (iterating over N scenes), we only update the global parameter once for one iteration. I think your understanding is right in this case.
Thanks for the reply. Also, I am trying to understand the code. Correct me if I am wrong:
for _, epoch_of_tasks in enumerate(train_dataloader):
# this for loop samples a batch from training dataset
for tidx, task in enumerate(epoch_of_tasks):
# this loop takes one task at once from the batch
for kidx, frame_sequence in enumerate(task[:k_shots]):
# for each task(video), you have multiple sequences
# this is where K updates comes in?
# it performs inner loop gradient update.
for vidx, val_frame_sequence in enumerate(task[-k_shots:]):
# it perform inner loop validation and record the gradient
# outer loop gradient update takes place when the process on current batch is done
Thanks.
Your understanding is correct, k_shots refers to k image sequences (input/output pairs).
Thanks. May ask how many epochs did you trained for pre-training and mete-training, respectively?
Sorry I didn't see the question. For pre-training, our model converges pretty quickly, so you need roughly only 10 epochs. For meta-training, around 200 epochs are trained until convergence.
Thanks. One more question, did you use the same optimizer for meta-testing as it was in meta-learning? Did you change any parameters in the optimizer? like weight decay etc?
Yes, meta-testing has the exact same setting with meta-learning to keep everything consistent (including optimizer and weight decay).
Thanks
Hi,
For what I understand is that, K stands for the number of gradient updates during test time. Does it mean that, for k=1,5,10, you have three different models? Does it mean that the K input/output pairs (in sub-section 'Task in Meta-learning, page 6), should be matched during meta-training and testing?
Another question is that, N stands for the number of scenes for each iteration, and K input/output pairs is sampled for each scene. Does it mean that during meta-training, the gradient is not updated until all N scenes are processed?
Thanks