I wonder how much the value of the base learning rate you use when batch size is 256? I have tried 0.1 ( as the figure 3 of the paper suggests ) , but got a bad consequence.
I find that in your code the default value of the learning rate is 0.04, does it work?
Yes, the README.md mentions that using a learning rate of 0.04 with batch size 256 yielded an accuracy of 71.21%, which is 6.61% higher than the baseline accuracy of 64.60%.
I wonder how much the value of the base learning rate you use when batch size is 256? I have tried 0.1 ( as the figure 3 of the paper suggests ) , but got a bad consequence. I find that in your code the default value of the learning rate is 0.04, does it work?