quark0 / darts

Differentiable architecture search for convolutional and recurrent networks
https://arxiv.org/abs/1806.09055
Apache License 2.0
3.92k stars 843 forks source link

A issue about Figure 2 in paper #116

Open jjjjjie opened 5 years ago

jjjjjie commented 5 years ago

Hi! Thanks for your sharing this work.

I have a small doubt about the result of Figure 2 when ξ = 0.5. I tired to reproduce this figure, but when I set ξ=0.5, the convergence point is still (2,2) same with ξ = 0 and alpha has never been updated. So, I calculate the gradient of alpha in the initial point. the gradient of train loss w.r.t w is gd=2w-2a, then mul ξ=0.5. So the gradient of alpha is [w-ξgd]-2=[w-0.5(2w-2a)]-2=0 because a=2, alpha should not be update. But in Figure 2, alpha is updating in the begining and then convergence at (1,1).

I wonder if I misunderstand the second-order approximation? If I am wrong, hope you can tell me. Thanks in advance.

epriwahyu commented 5 years ago

Hi jjjjjie, If you mind, can I have your help? I want to start to use this architecture, but dunno where I should start.

Big thanks

jjjjjie commented 5 years ago

@epriwahyu Of cource! But which architecture you want to use? Just evaluation part or search part?

epriwahyu commented 5 years ago

@jjjjjie : fyi, I am a freshman in CS. I already try to install/follow the step in here: https://github.com/quark0/darts.

but then, what should I do next is still confusing me.

My plan to use this is to apply on image classification project (Automated Optical Inspection).

if you can help to give me the guidance, I will so appreciate it.

thanks