Closed NookLook2014 closed 1 year ago
Hello @NookLook2014,
Yes, the two are the same but accumulating gradients is much cheaper than accumulating loss because you can free the activations.
One issue in your code: you accumulate evaluation_error.item()
but should accumulate evaluation_error
.
Hello @NookLook2014,
Yes, the two are the same but accumulating gradients is much cheaper than accumulating loss because you can free the activations.
One issue in your code: you accumulate
evaluation_error.item()
but should accumulateevaluation_error
.
Thanks for your confirm. As to the issue, I used the evaluation_error but faced with the CUDA_OUT_OF_MEMORY issue even the meta_bsz is quite small.
With the following code, I tried to replace the first accumulating gradients then averaging them with first accumulating loss then computing gradients, and it also works and much faster. But I'm not quite farmiliar with meta-learning so not sure my way has the same effect as the typical way in the example code.
`for iteration in range(1, num_iterations+1): opt.zero_grad() meta_train_error = 0.0 meta_train_accuracy = 0.0