quark0 / darts

Differentiable architecture search for convolutional and recurrent networks
https://arxiv.org/abs/1806.09055
Apache License 2.0
3.92k stars 843 forks source link

Searched architecture training time. #80

Open zhaohui-yang opened 5 years ago

zhaohui-yang commented 5 years ago

Thank you for sharing this beautiful code! I use the default DARTS_V2 architecture with appriximately 3.3M parameters to train CIFAR10 dataset. However, I found that it requires about 24h to train 600 epochs on V100 GPU. I wonder if this is correct? For other architectures like resnet, only 6h is needed to train CIFAR10.

Each epoch takes 2.5min for batchsize = 128 (default = 96, I enlarge it, but I think it would only speed up training time.)

04/03 10:38:45 AM epoch 410 lr 5.692012e-03 04/03 10:38:46 AM train 000 1.231754e-01 96.093750 100.000000 04/03 10:39:03 AM train 050 1.354196e-01 97.104782 99.969368 04/03 10:39:20 AM train 100 1.354751e-01 97.199875 99.976791 04/03 10:39:37 AM train 150 1.358063e-01 97.138863 99.984482 04/03 10:39:53 AM train 200 1.365068e-01 97.127640 99.976677 04/03 10:40:11 AM train 250 1.384853e-01 97.099106 99.978218 04/03 10:40:28 AM train 300 1.398857e-01 97.085236 99.971443 04/03 10:40:45 AM train 350 1.402510e-01 97.057510 99.973289 04/03 10:40:59 AM train_acc 97.085999 04/03 10:40:59 AM valid 000 1.164408e-01 95.312500 100.000000 04/03 10:41:02 AM valid 050 1.418064e-01 96.155029 99.892776 04/03 10:41:03 AM valid_acc 96.089996

MarkAlive commented 5 years ago

I also encountered the same problem. The training time with NasNet like network is more than that with resnet. Even thought resnet has higher params and flops.

Catosine commented 5 years ago

Well, here is the thing: DARTS is fast ONLY when searching for models but it is indeed very slow when retrain searched models.

Best,

skx6 commented 5 years ago

Did the author use cutout while training? I didn't find cutout except parser. just here: parser.add_argument('cutout', action='store_true', default='False').But it didn't appear in the following main code.

zhaohui-yang commented 5 years ago

Did the author use cutout while training? I didn't find cutout except parser. just here: parser.add_argument('cutout', action='store_true', default='False').But it didn't appear in the following main code.

https://github.com/quark0/darts/blob/master/cnn/utils.py#L72

Margrate commented 5 years ago

Did the author use cutout while training? I didn't find cutout except parser. just here: parser.add_argument('cutout', action='store_true', default='False').But it didn't appear in the following main code.

https://github.com/quark0/darts/blob/master/cnn/utils.py#L72

thank you. Does DARTS_V2 in genotype.py mean DARTS(second order) in the paper?

zhaohui-yang commented 5 years ago

Did the author use cutout while training? I didn't find cutout except parser. just here: parser.add_argument('cutout', action='store_true', default='False').But it didn't appear in the following main code.

https://github.com/quark0/darts/blob/master/cnn/utils.py#L72

thank you. Does DARTS_V2 in genotype.py mean DARTS(second order) in the paper?

Yes it does. I retrained the model and achieved 97.30% accuracy.

Margrate commented 5 years ago

It takes a long time to search architecture and train from start. If I use other code with Resnet to train from start. It only takes 1/5 time of DARTS code. Maybe the DARTS code is not so efficient?

Catosine commented 5 years ago

It takes a long time to search architecture and train from start. If I use other code with Resnet to train from start. It only takes 1/5 time of DARTS code. Maybe the DARTS code is not so efficient?

I am searching for models on my own dataset. And it turns out the same as you've noticed.

zhaohui-yang commented 5 years ago

@Margrate In my opinion, the model size and FLOPs are efficient, however, considering the inference time, non-serial architecture like DARTS takes much more time thus less efficient.