keskarnitish / large-batch-training

Code to reproduce some of the figures in the paper "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
MIT License
138 stars 23 forks source link