Open chmxu opened 4 years ago
Thanks for the inspiring work! I have a question about the implementation of the rbp. While in original derivation, $v=\frac{\partial L^Q}{\partial W_b}$ (line 14 in Alg. 1), where $W_b$ is the classifier weight before dummy GD, the implementation https://github.com/renmengye/inc-few-shot-attractor-public/blob/master/fewshot/models/rbp.py#L40 seems to compute the gradient w.r.t. classifier weight after dummy GD. Is there any problem?
Hi, there shouldn't be problem with compute the gradient after the dummy GD and in fact it is by design. It is backproping through the dummy GD step multiple times to compute the gradient of the converging process.
Thanks for the inspiring work! I have a question about the implementation of the rbp. While in original derivation, $v=\frac{\partial L^Q}{\partial W_b}$ (line 14 in Alg. 1), where $W_b$ is the classifier weight before dummy GD, the implementation https://github.com/renmengye/inc-few-shot-attractor-public/blob/master/fewshot/models/rbp.py#L40 seems to compute the gradient w.r.t. classifier weight after dummy GD. Is there any problem?