quesion about selecting architectures

melodyguan / enas

TensorFlow Code for paper "Efficient Neural Architecture Search via Parameter Sharing"

https://arxiv.org/abs/1802.03268

Apache License 2.0

1.58k stars 390 forks source link

quesion about selecting architectures #39

Open hsl0529 opened 6 years ago

hsl0529 commented 6 years ago

Aftering running cifar10_micro_search.sh, I got a bunch of architecures and their accuracy, then i selected one architeture with a relatively higher accuracy and retrained it from scratch to get a high accuracy, That is what we should do, right? The question is that i found that even i choose a low accuracy architecture, or a random architecture, i can still get a high accuracy after retraining it from scratch, they are all about 96%. there seems to be little even no difference between different architectures,Did anyone meet the same question as me? or that is because there are some things i did wrong?

bkj commented 6 years ago

2.89 uses cutout regularization, which cifar10_micro_final.sh does not use by default. Without cutout, the number in the paper is 3.54 -- but they don't measure the variance. When I trained using their exact setup, I got an error rate of 3.88. I trained 4 random architectures, and the best had an error rate of 3.95.

So I think we need to do a larger scale study of the distribution of scores attained through ENAS and random sampling (for the micro search space, definitely, but probably for the others as well)

~ Ben

AranKomat commented 6 years ago

Thanks, Ben. I hope you guys could share the data points of (acc [shared], acc [fixed]) (even if you have a small number of them) to see how much correlation there is as well as their respective distribution. In the paper, only 10 arcs were compared for selection, but if the correlation is strong and if the variation of acc [fixed] is large, it must be worth comparing thousands of arcs, which isn't too costly.

hsl0529 commented 6 years ago

thanks for your reply, Ben. Can you tell me how to add cutout regularization? I haven't done this before, or could you share a link of paper about cutout regularization