Open zhaohui-yang opened 5 years ago
I also encountered the same problem. The training time with NasNet like network is more than that with resnet. Even thought resnet has higher params and flops.
Well, here is the thing: DARTS is fast ONLY when searching for models but it is indeed very slow when retrain searched models.
Best,
Did the author use cutout while training? I didn't find cutout except parser. just here: parser.add_argument('cutout', action='store_true', default='False').But it didn't appear in the following main code.
Did the author use cutout while training? I didn't find cutout except parser. just here: parser.add_argument('cutout', action='store_true', default='False').But it didn't appear in the following main code.
https://github.com/quark0/darts/blob/master/cnn/utils.py#L72
Did the author use cutout while training? I didn't find cutout except parser. just here: parser.add_argument('cutout', action='store_true', default='False').But it didn't appear in the following main code.
https://github.com/quark0/darts/blob/master/cnn/utils.py#L72
thank you. Does DARTS_V2 in genotype.py mean DARTS(second order) in the paper?
Did the author use cutout while training? I didn't find cutout except parser. just here: parser.add_argument('cutout', action='store_true', default='False').But it didn't appear in the following main code.
https://github.com/quark0/darts/blob/master/cnn/utils.py#L72
thank you. Does DARTS_V2 in genotype.py mean DARTS(second order) in the paper?
Yes it does. I retrained the model and achieved 97.30% accuracy.
It takes a long time to search architecture and train from start. If I use other code with Resnet to train from start. It only takes 1/5 time of DARTS code. Maybe the DARTS code is not so efficient?
It takes a long time to search architecture and train from start. If I use other code with Resnet to train from start. It only takes 1/5 time of DARTS code. Maybe the DARTS code is not so efficient?
I am searching for models on my own dataset. And it turns out the same as you've noticed.
@Margrate In my opinion, the model size and FLOPs are efficient, however, considering the inference time, non-serial architecture like DARTS takes much more time thus less efficient.
Thank you for sharing this beautiful code! I use the default DARTS_V2 architecture with appriximately 3.3M parameters to train CIFAR10 dataset. However, I found that it requires about 24h to train 600 epochs on V100 GPU. I wonder if this is correct? For other architectures like resnet, only 6h is needed to train CIFAR10.
Each epoch takes 2.5min for batchsize = 128 (default = 96, I enlarge it, but I think it would only speed up training time.)
04/03 10:38:45 AM epoch 410 lr 5.692012e-03 04/03 10:38:46 AM train 000 1.231754e-01 96.093750 100.000000 04/03 10:39:03 AM train 050 1.354196e-01 97.104782 99.969368 04/03 10:39:20 AM train 100 1.354751e-01 97.199875 99.976791 04/03 10:39:37 AM train 150 1.358063e-01 97.138863 99.984482 04/03 10:39:53 AM train 200 1.365068e-01 97.127640 99.976677 04/03 10:40:11 AM train 250 1.384853e-01 97.099106 99.978218 04/03 10:40:28 AM train 300 1.398857e-01 97.085236 99.971443 04/03 10:40:45 AM train 350 1.402510e-01 97.057510 99.973289 04/03 10:40:59 AM train_acc 97.085999 04/03 10:40:59 AM valid 000 1.164408e-01 95.312500 100.000000 04/03 10:41:02 AM valid 050 1.418064e-01 96.155029 99.892776 04/03 10:41:03 AM valid_acc 96.089996