Hi,
After I have read this paper, I have a question.
When you train the based based on ProxylessNAS(v2 based ss), you mention "We follow training
settings in Cai et al. [2018c], train the models for 120 epochs with a learning rate 0.4 (annealed
down to zero following a cosine schedule), and a batch size of 1024 across 4 Tesla V100 GPUs".
However, Proxyless's training parameters is quite different with yours.
So is this a mistake?
@cxxgtxy
Hi,
Sorry to be confusing, here we mean that except what we have outlined(epochs, batch size, initial learning rate), other training details(e.g. momentum, label smooth) should be the same
Hi, After I have read this paper, I have a question. When you train the based based on ProxylessNAS(v2 based ss), you mention "We follow training settings in Cai et al. [2018c], train the models for 120 epochs with a learning rate 0.4 (annealed down to zero following a cosine schedule), and a batch size of 1024 across 4 Tesla V100 GPUs". However, Proxyless's training parameters is quite different with yours. So is this a mistake?