quark0 / darts

Differentiable architecture search for convolutional and recurrent networks
https://arxiv.org/abs/1806.09055
Apache License 2.0
3.91k stars 844 forks source link

why the hyperparameters, for both the training pipeline (train_search.py )and the final evaluation pipeline (train.py) differ a lot!? #107

Open NdaAzr opened 5 years ago

NdaAzr commented 5 years ago

I am wondering why hyperparameters in training is different with the evaluation pipeline.

For example, here are the hyperparameters for CIFAR, in this format: training pipeline value -> final evaluation pipeline value: cells: 8 -> 20 batch size: 64 -> 96 initial channels: 16 -> 36 epochs: 50 -> 600 droppath: 0.3->(with probability 0.2) auxiliary weight: no -> yes (with weight 0.4)

NdaAzr commented 5 years ago

I found the answer to this question here:

https://openreview.net/forum?id=S1eYHoC5FX

For convolutional cells:

Our setup of #cells (8->20), #epochs (600) and weight for the auxiliary head (0.4) in the final evaluation exactly follows Zoph et al., 2018. The #init_channels is enlarged from 16 to 36 to ensure a comparable model size (~3M) with other baselines. Given those settings, we then use the largest possible batch size (96) for a single GPU. The drop path probability was tuned wrt the validation set among the choices of (0.1, 0.2, 0.3) given the best cell learned by DARTS.

YANGWAGN commented 5 years ago

Hi ,NdaAzr! In the code, I know that train_search.py haw to use, however , I don't see the code about obtaining the best architecture and saving the ultimate arch parameters. And how to Construct the architecture with the arch parameters in the train.py. Thank you!

NdaAzr commented 5 years ago

Hi @YANGWAGN,

When you ran the train_search.py, the model.genotype() will give you the best-learned cell. So, you need to train it for a few numbers of epochs and get the genotype with the highest accuracy.

Then, you need to give the genotype.py this genotype and run train.py. see example here:

DARTS_v2 = Genotype(normal=[('dil_conv_3x3', 0), ('skip_connect', 1), ('skip_connect', 1), ('sep_conv_3x3', 2), ('sep_conv_3x3', 2), ('sep_conv_3x3', 0), ('skip_connect', 1), ('dil_conv_3x3', 0)], normal_concat=range(2, 6), reduce=[('dil_conv_3x3', 1), ('sep_conv_3x3', 0), ('max_pool_3x3', 0), ('dil_conv_5x5', 2), ('dil_conv_5x5', 3), ('max_pool_3x3', 1), ('max_pool_3x3', 0), ('max_pool_3x3', 1)], reduce_concat=range(2, 6))

DARTS = DARTS_v2