keskarnitish / large-batch-training

Code to reproduce some of the figures in the paper "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
MIT License
137 stars 24 forks source link