quark0 / darts

Differentiable architecture search for convolutional and recurrent networks
https://arxiv.org/abs/1806.09055
Apache License 2.0
3.9k stars 845 forks source link

ImageNet reproduce results in paper #92

Open haithanhp opened 5 years ago

haithanhp commented 5 years ago

Hi, Thanks for your great implementation of DARTS.

I am trying to reproduce your reported results in paper for ImageNet. I ran train_search for Cifar10 and searched new architecture: DARTS_V3 = Genotype( normal=[('skip_connect', 0), ('sep_conv_3x3', 1), ('skip_connect', 0), ('skip_connect', 1), ('sep_conv_3x3', 1), ('skip_connect', 0), ('skip_connect', 0), ('skip_connect', 1)], normal_concat=[2, 3, 4, 5], reduce=[('max_pool_3x3', 0), ('avg_pool_3x3', 1), ('max_pool_3x3', 0), ('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 3), ('max_pool_3x3', 0), ('skip_connect', 2)], reduce_concat=[2, 3, 4, 5]) DARTS = DARTS_V3 After that, I trained this model from scratch with ImageNet with params --unrolled. However, I only achieved 35% error rate while in the paper it should be 26.7%.

Could you show the step by step to reproduce exactly results on ImageNet?

Thanks, Hai

Catosine commented 5 years ago

Hello @HaiPhan1991 I was wondering which code are used to do the retrain. And how about the parameters (i.e. epoch and so on)

haithanhp commented 5 years ago

Hi @Catosine,

Thanks for your response. I use train_search.py and train_imagenet.py in cnn folder and did not change anything about batch_size, learning rate, and so on.

Btw, I have one concern about the idea of the paper. I see you train a pre-defined network with alpha parameters (architecture parameters) and achieved alphas. Then you used trained alpha to construct a target network. How can you make sure that this target architecture yields good accuracy? Do you have any idea to do an aware-performance for this?

Thanks, Hai

Catosine commented 5 years ago

Hi @HaiPhan1991 !

Thanks for your reply! I'm not the author. But I did played a bit with the codes. For your question,

How can you make sure that this target architecture yields good accuracy?

Actually, the performance of the architecture can only be measured at the retrain stage. And it is pretty hard to tell if you've got a good or bad model right after searching. My suggestion is, before trying to find new models, you may run couple of classic models (i.e. VGG, ResNet, and also other SOTA models) on your dataset and use it as an baseline. If your searched model does better than your baseline, then I think it is reasonable to conclude that this model is a good one.

Also, I've noticed that your normal cell contains a lot of skip-connects. This is actually a form of "oversearching" that your normal cell has only a little learnable parameters. Obviously, this will lead to a bad retrain performance. To solve this problem, I recommend you to read Progressive DARTS, where this paper also noticed this issue and raised a "search space regularization" to solve it. They also released their code on github! It is very easy to get hands on for DARTS players, because the PDARTS is modified from DARTS.

GL,

haithanhp commented 5 years ago

Hi @Catosine,

Thanks for your response. Sure. I did try the code you sent and can reproduce the ImageNet accuracy. Thanks for your suggestions.

Catosine commented 5 years ago

Glad to here that! Good luck for the rest:) @HaiPhan1991

NdaAzr commented 5 years ago

@Catosine and @HaiPhan1991 could you please explain how you reconstruct the model and visualized the learned cell? I am using DARTS for a customize dataset.

If I run the visualize.py DARTS to produce figures for learned cell, it produces same figures as figures 4 and 5 in the paper but I need to produce the learned cell for my customize trained dataset. Could you please point me in the right direction?

Thanks, Neda

haithanhp commented 5 years ago

@NdaAzr , Just get your searched architecture and put a name in genotypes.py (My_Model= [your architecture searched]). For example: DARTS_V3 = Genotype(normal=[('skip_connect', 0), ('sep_conv_3x3', 1), ('skip_connect', 0), ('skip_connect', 1), ('sep_conv_3x3', 1), ('skip_connect', 0), ('skip_connect', 0), ('skip_connect', 1)], normal_concat=[2, 3, 4, 5], reduce=[('max_pool_3x3', 0), ('avg_pool_3x3', 1), ('max_pool_3x3', 0), ('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 3), ('max_pool_3x3', 0), ('skip_connect', 2)], reduce_concat=[2, 3, 4, 5]).

For visualization, you only use: python visualize.py [name architecture in genotypes.py]. For example, python visualize.py DARTS_V3.

Hope this can help.

Thanks, Hai

NdaAzr commented 5 years ago

Hi @HaiPhan1991, Thanks a lot. It does make sense. Do you visualize only cells? Could you please use my attached searched model and give me an example of how you write it in genotype(). (here is searched [architecture)](https://www.dropbox.com/s/8yh3xfwmjgg2ra0/out_model.txt?dl=0)

Here is a snippet of that, not sure if I am doing right?

Neda_V2 = Genotype(normal=[('ReLUConvBN', 0), ('ReLUConvBN', 1), ('ReLUConvBN', 2), ('FactorizedReduce', 3), ('ReLUConvBN', 4),......

Catosine commented 5 years ago

@NdaAzr Hi there! I did the same as @HaiPhan1991 does to retrain the model. As for the second question, I didn't try the visualization. But you could easily understand the output genotype. For example:

DARTS_V1 = Genotype( normal=[ ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('skip_connect', 0), ('sep_conv_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('skip_connect', 2)], normal_concat=[2, 3, 4, 5], reduce=[ ('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2), ('max_pool_3x3', 0), ('max_pool_3x3', 0), ('skip_connect', 2), ('skip_connect', 2), ('avg_pool_3x3', 0)], reduce_concat=[2, 3, 4, 5])

Every two lines in a cell stands for two inputs of a given node. For each line, the former suggests the operation, and the latter suggests the previous node.

GL,

NdaAzr commented 5 years ago

thank you @Catosine and @HaiPhan1991 for explanation.

one more question, I am not sure about understanding of topk accuracy precision. what is k here, is it classes? Is top1 accuracy of class 1, and top5 accuracy of class 5 or I misunderstood that?

def accuracy(output, target, topk=(1,)):
  maxk = max(topk)  #get the biggest k
  batch_size = target.size(0)  #get the batch size

  _, pred = output.topk(maxk, 1, True, True) #pred return indexes of prediction
  pred = pred.t() #transpose pred like [maxk, batch_size]
  correct = pred.eq(target.view(1, -1).expand_as(pred)) #flat pred and target to get same shape

   # then we look for each k summing 1s in the correct matrix for first k element
  res = []
  for k in topk:
    correct_k = correct[:k].view(-1).float().sum(0)
    res.append(correct_k.mul_(100.0/batch_size))
  return res
Catosine commented 5 years ago

@NdaAzr Hello! The topk is an expression of accuracy. It means that a data is correctly identified within the first k th predictions. So, top 1 means that the first prediction is exactly the label, and the top 5 means that the correct label is in the first 5 predictions. GL,

thinkInJava33 commented 4 years ago

How long does it take to train on imagenet?It takes much more time than it mentioned in the paper

Catosine commented 4 years ago

Hello, @thinkInJava33 It really depends. From my observation, both training time and validation accuracy varies a lot, even with the same set of searching parameters and data. GL, PF

pingguokiller commented 4 years ago

Hi @HaiPhan1991 !

Thanks for your reply! I'm not the author. But I did played a bit with the codes. For your question,

How can you make sure that this target architecture yields good accuracy?

Actually, the performance of the architecture can only be measured at the retrain stage. And it is pretty hard to tell if you've got a good or bad model right after searching. My suggestion is, before trying to find new models, you may run couple of classic models (i.e. VGG, ResNet, and also other SOTA models) on your dataset and use it as an baseline. If your searched model does better than your baseline, then I think it is reasonable to conclude that this model is a good one.

Also, I've noticed that your normal cell contains a lot of skip-connects. This is actually a form of "oversearching" that your normal cell has only a little learnable parameters. Obviously, this will lead to a bad retrain performance. To solve this problem, I recommend you to read Progressive DARTS, where this paper also noticed this issue and raised a "search space regularization" to solve it. They also released their code on github! It is very easy to get hands on for DARTS players, because the PDARTS is modified from DARTS.

GL,

The drop_path_prob is set as 0, but the paper said it follows "Regularized evolution for image classifier architecture search", which set 0.3.

Does it matter??

I use the drop_path_prob 0 and get a test acc 70.8%.