Open sjhan91 opened 4 years ago
Here is what I have been able to figure out:
Architect._hessian_vector_product
is for Eq. 8.Architect._backward_step_unrolled
:
self.model
to the temporary unrolled_model
instance.Architect.step
, the call to self.optimizer.step()
is actually performing gradient descent to update alpha.Here is what I have been able to figure out:
Architect._hessian_vector_product
is for Eq. 8.In
Architect._backward_step_unrolled
:
- I agree with you that the first loop is for Eq 7.
- The second loop copies weights from the
self.model
to the temporaryunrolled_model
instance.- In
Architect.step
, the call toself.optimizer.step()
is actually performing gradient descent to update alpha.
I understand. Thanks a lot!
Hi, I'm confused code details in alpha update.
I think first loop is for Eq. 7, then we got Eq. 6 value. After that, to update alpha, I think gradient descent is needed. But in the code, there is just copy operation. (second loop)
What is wrong with my opinion?