Closed lld533 closed 5 years ago
Hi!
Following the original LUA implementation, for BS=128, the learning rate should be 0.1, since they use lr=0.05 for BS=64 and 0.025 for BS=32.
Hi! Following the original LUA implementation, for BS=128, the learning rate should be 0.1, since they use lr=0.05 for BS=64 and 0.025 for BS=32.
Thanks!
Hi,
May I know what's the initial learning rate used in Cifar10 and Cifar100 experiments (-b 128 on 2 GPU cards)? The default value 0.1 or the sample value 0.05? Many thanks in advance!