tanglang96 / DDPNAS

Dynamic Distribution Pruning for Efficient Network Architecture Search
48 stars 7 forks source link

possible mistakes in the paper. #1

Closed cxxgtxy closed 5 years ago

cxxgtxy commented 5 years ago

Hi, After I have read this paper, I have a question. When you train the based based on ProxylessNAS(v2 based ss), you mention "We follow training settings in Cai et al. [2018c], train the models for 120 epochs with a learning rate 0.4 (annealed down to zero following a cosine schedule), and a batch size of 1024 across 4 Tesla V100 GPUs". However, Proxyless's training parameters is quite different with yours. So is this a mistake?

tanglang96 commented 5 years ago

@cxxgtxy Hi, Sorry to be confusing, here we mean that except what we have outlined(epochs, batch size, initial learning rate), other training details(e.g. momentum, label smooth) should be the same