Question about the implementation

renmengye / inc-few-shot-attractor-public

Code for Paper "Incremental Few-Shot Learning with Attention Attractor Networks"

MIT License

118 stars 27 forks source link

Question about the implementation #4

Open chmxu opened 4 years ago

chmxu commented 4 years ago

Thanks for the inspiring work! I have a question about the implementation of the rbp. While in original derivation, $v=\frac{\partial L^Q}{\partial W_b}$ (line 14 in Alg. 1), where $W_b$ is the classifier weight before dummy GD, the implementation https://github.com/renmengye/inc-few-shot-attractor-public/blob/master/fewshot/models/rbp.py#L40 seems to compute the gradient w.r.t. classifier weight after dummy GD. Is there any problem?

renmengye commented 4 years ago

Hi, there shouldn't be problem with compute the gradient after the dummy GD and in fact it is by design. It is backproping through the dummy GD step multiple times to compute the gradient of the converging process.