keskarnitish / large-batch-training

Code to reproduce some of the figures in the paper "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
MIT License
138 stars 23 forks source link

Reproducing the parametric plot #1

Closed tomMoral closed 7 years ago

tomMoral commented 7 years ago

First of all, thanks for sharing your code! I really appreciate it!

I tried to fix it to reproduce the results from your paper but your code is not working out of the box on my system. I did some fixes but I did not get the same parametric plot. The performances are stucked at 85%
Maybe I did not run it for long enough or I did a mistake when I changed your code.
Could you help me reproduce your results?

The major changes I did are:

thanks for your help.

keskarnitish commented 7 years ago

Thanks for your comment; let me look into this carefully. The changes in Keras and other dependencies will likely limit exact reproducibility but I suspect that overall reproducibility should not be hindered. As you mentioned, there are quite a few API changes in Keras 2.0 which were not backwards compatible, hence causing issues. I will look into your commits and, in the meantime, will also look into creating a smaller/toy example that reproduces the observation.

keskarnitish commented 7 years ago

I took the CIFAR10 example in Keras' examples folder and trained it via SB and LB Adam to yield the following graph.

Toy Example

I chose epochs=100 arbitrarily, I think the testing/training accuracies for LB Adam can be improved if it is run for longer.

The code to reproduce this is below. It should run out-of-the-box for Keras 2.0 + Theano. I have not tested it for Tensorflow.

from __future__ import print_function
import numpy
numpy.random.seed(1337)
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import matplotlib.pyplot as plt

num_classes = 10
epochs = 100

# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()

model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

# Let's train the model using RMSprop
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

model.save_weights('x0.h5')
model.fit(x_train, y_train,
          batch_size=256,
          epochs=epochs,
          validation_data=(x_test, y_test),
          shuffle=True)
sb_solution = [p.get_value() for p in model.trainable_weights]

model.load_weights('x0.h5')
model.fit(x_train, y_train,
          batch_size=5000,
          epochs=epochs,
          validation_data=(x_test, y_test),
          shuffle=True)
lb_solution = [p.get_value() for p in model.trainable_weights]

# parametric plot data collection
# we discretize the interval [-1,2] into 25 pieces
alpha_range = numpy.linspace(-1, 2, 25)
data_for_plotting = numpy.zeros((25, 4))

i = 0
for alpha in alpha_range:
    for p in range(len(sb_solution)):
        model.trainable_weights[p].set_value(lb_solution[p]*alpha +
                                             sb_solution[p]*(1-alpha))
    train_xent, train_acc = model.evaluate(x_train, y_train,
                                           batch_size=5000, verbose=0)
    test_xent, test_acc = model.evaluate(x_test, y_test,
                                         batch_size=5000, verbose=0)
    data_for_plotting[i, :] = [train_xent, train_acc, test_xent, test_acc]
    i += 1

# finally, let's plot the data
# we plot the XENT loss on the left Y-axis
# and accuracy on the right Y-axis
# if you don't have Matplotlib, simply print
# data_for_plotting to file and use a different plotter

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(alpha_range, data_for_plotting[:, 0], 'b-')
ax1.plot(alpha_range, data_for_plotting[:, 2], 'b--')

ax2.plot(alpha_range, data_for_plotting[:, 1]*100., 'r-')
ax2.plot(alpha_range, data_for_plotting[:, 3]*100., 'r--')

ax1.set_xlabel('alpha')
ax1.set_ylabel('Cross Entropy', color='b')
ax2.set_ylabel('Accuracy', color='r')
ax1.legend(('Train', 'Test'), loc=0)

ax1.grid(b=True, which='both')
plt.savefig('Figures/CX.pdf')