quark0 / darts

Differentiable architecture search for convolutional and recurrent networks
https://arxiv.org/abs/1806.09055
Apache License 2.0
3.92k stars 843 forks source link

Code details #142

Open sjhan91 opened 4 years ago

sjhan91 commented 4 years ago

Hi, I'm confused code details in alpha update.

for g, ig in zip(dalpha, implicit_grads):
  g.data.sub_(eta, ig.data)

for v, g in zip(self.model.arch_parameters(), dalpha):
  if v.grad is None:
    v.grad = Variable(g.data)
  else:
    v.grad.data.copy_(g.data)

I think first loop is for Eq. 7, then we got Eq. 6 value. After that, to update alpha, I think gradient descent is needed. But in the code, there is just copy operation. (second loop)

What is wrong with my opinion?

Jasha10 commented 4 years ago

Here is what I have been able to figure out:

sjhan91 commented 4 years ago

Here is what I have been able to figure out:

  • Architect._hessian_vector_product is for Eq. 8.
  • In Architect._backward_step_unrolled:

    • I agree with you that the first loop is for Eq 7.
    • The second loop copies weights from the self.model to the temporary unrolled_model instance.
  • In Architect.step, the call to self.optimizer.step() is actually performing gradient descent to update alpha.

I understand. Thanks a lot!