Open miaozhang0525 opened 4 years ago
Let's talk about optimization difficulty first. Increasing width makes optimization easier, more skip makes optimization easier and increasing depth makes optimization harder. Due to the memory constraint in DARTS, we have to decrease the width from 36(augment)to 16(search), this increase the optimization difficulty. So if we still use 20 as depth, the optimization difficulty during search would be larger than augment, this makes the system want to choose more than enough skip-connection to balance the difficulty. To chose the proper number of skip automatically, we need to control the difficulty come from width*depth the same. Here comes in the gradient confusion to measure this difficulty.
Hope this explains.
Hi, thanks for you share your works. Nice work!
But I got confusion about the gradient confusion for determining the desired depth.
How do you get the desired depth and keep your architecture search in this depth?
Sincerely