equivalent implementation of net from keras

haarburger commented 8 years ago

I'm trying to port a simple DNN that I wrote in Keras as follows

        model = Sequential()
        model.add(Dense(100, input_dim=40, init="uniform",
                        activation="relu"))
        model.add(Dense(10, input_dim=100, init="uniform",
                        activation="relu"))
        model.add(Dense(200, input_dim=10, init="uniform",
                        activation="relu"))
        model.add(Dense(4, init="uniform", activation="linear"))
        optimizer = Adagrad(lr=learning_rate, epsilon=1e-06)
        model.compile(loss=loss, optimizer=optimizer)
        model.fit(X_train, y_train,
                      nb_epoch=n_epoch,
                      batch_size=batch_size,
                      validation_data=(X_test, y_test))

My skflow version looks like this:

def model(X, y):
    features = skflow.ops.dnn(X, hidden_units=40, 100, 10, 200],
                                              activation=tf.nn.relu)
    return skflow.models.linear_regression(features, y)

regressor = skflow.TensorFlowEstimator(model_fn=model, n_classes=0,
                                                                  steps=80000,
                                                                  learning_rate=learning_rate,
                                                                  batch_size=batch_size)
regressor.fit(X_train, y_train)

My DNN has four outputs and I am using the same data in both implementations. However, the results are still very different and I'm wondering why. Are the two implementations equivalent in their topology? What could have gone wrong?

ilblackdragon commented 8 years ago

Hey,

I'm not super familiar with Keras, but here are few questions: 1) Seems like you added an additional layer with 40 units before 100+10+20 in skflow? 2) You should specify the same optimizer with optimizer='Adagrad' as parameter to skflow.TensorFlowEstimator. Right now you seems to be using SGD. 3) What is loss function in the Keras? skflow for regression (n_classes=0) uses mean squared error. 4) How does loss differences look, e.g after 10k/20k steps with the same optimizer and # layers?

haarburger commented 8 years ago

Thanks a lot! You spotted the problem in question 1, where I incorrectly added the number of units in my input layer to hidden_units.

tensorflow / skflow

equivalent implementation of net from keras #124