Closed tomMoral closed 7 years ago
Thanks for your comment; let me look into this carefully. The changes in Keras and other dependencies will likely limit exact reproducibility but I suspect that overall reproducibility should not be hindered. As you mentioned, there are quite a few API changes in Keras 2.0 which were not backwards compatible, hence causing issues. I will look into your commits and, in the meantime, will also look into creating a smaller/toy example that reproduces the observation.
I took the CIFAR10 example in Keras' examples folder and trained it via SB and LB Adam to yield the following graph.
I chose epochs=100
arbitrarily, I think the testing/training accuracies for LB Adam can be improved if it is run for longer.
The code to reproduce this is below. It should run out-of-the-box for Keras 2.0 + Theano. I have not tested it for Tensorflow.
from __future__ import print_function
import numpy
numpy.random.seed(1337)
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import matplotlib.pyplot as plt
num_classes = 10
epochs = 100
# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
# Let's train the model using RMSprop
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
model.save_weights('x0.h5')
model.fit(x_train, y_train,
batch_size=256,
epochs=epochs,
validation_data=(x_test, y_test),
shuffle=True)
sb_solution = [p.get_value() for p in model.trainable_weights]
model.load_weights('x0.h5')
model.fit(x_train, y_train,
batch_size=5000,
epochs=epochs,
validation_data=(x_test, y_test),
shuffle=True)
lb_solution = [p.get_value() for p in model.trainable_weights]
# parametric plot data collection
# we discretize the interval [-1,2] into 25 pieces
alpha_range = numpy.linspace(-1, 2, 25)
data_for_plotting = numpy.zeros((25, 4))
i = 0
for alpha in alpha_range:
for p in range(len(sb_solution)):
model.trainable_weights[p].set_value(lb_solution[p]*alpha +
sb_solution[p]*(1-alpha))
train_xent, train_acc = model.evaluate(x_train, y_train,
batch_size=5000, verbose=0)
test_xent, test_acc = model.evaluate(x_test, y_test,
batch_size=5000, verbose=0)
data_for_plotting[i, :] = [train_xent, train_acc, test_xent, test_acc]
i += 1
# finally, let's plot the data
# we plot the XENT loss on the left Y-axis
# and accuracy on the right Y-axis
# if you don't have Matplotlib, simply print
# data_for_plotting to file and use a different plotter
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(alpha_range, data_for_plotting[:, 0], 'b-')
ax1.plot(alpha_range, data_for_plotting[:, 2], 'b--')
ax2.plot(alpha_range, data_for_plotting[:, 1]*100., 'r-')
ax2.plot(alpha_range, data_for_plotting[:, 3]*100., 'r--')
ax1.set_xlabel('alpha')
ax1.set_ylabel('Cross Entropy', color='b')
ax2.set_ylabel('Accuracy', color='r')
ax1.legend(('Train', 'Test'), loc=0)
ax1.grid(b=True, which='both')
plt.savefig('Figures/CX.pdf')
First of all, thanks for sharing your code! I really appreciate it!
I tried to fix it to reproduce the results from your paper but your code is not working out of the box on my system. I did some fixes but I did not get the same parametric plot. The performances are stucked at
85%
Maybe I did not run it for long enough or I did a mistake when I changed your code.
Could you help me reproduce your results?
The major changes I did are:
mode=2
anymore in the batch normalization, onlymode=0
.Conv2D
layers are defined.X_train/X_test
to get the code running.keras
API, to be compatible with thetensorflow
backend.thanks for your help.