Closed hungsing92 closed 6 years ago
hello: As the author said in the paper"The initial learning rate is set to 0.6 and decreased by a factor of 10 every 30 epochs.All models are trained for 100 epochs from scratch, using the weight initialisation strategy described in[8]",when I read the paper witch called "Delving deep into rectifiers: Surpassing human-level performance on ImageNet" in " https://arxiv.org/pdf/1502.01852.pdf" think author use the weight_decay is 0.0005
We follow the setting of weight_decay with their non-SE counterpart (e.g. 0.0001 for SE-ResNet, 0.0002 for SE-BN-Inception).
Hi Hujie:
First of all, thanks for your excellent work! I can't find the weightdecay parameter in your paper, would you please tell me? ^^
Best regards, hungsing