Why MSE calculated by Keras Compile is different from MSE calculated by Scikit-Learn?

I'm training a neural network model for forecasting. Loss function is Mean Squared Error (MSE). However, I found that MSE calculated by Keras is much different from one calculated by Scikit-learn.

Epoch 1/10 162315/162315 [==============================] - 14s 87us/step - loss: 111.8723 - mean_squared_error: 111.8723 - val_loss: 9.5308 - val_mean_squared_error: 9.5308

Epoch 00001: loss improved from inf to 111.87234, saving model to /home/Model/2019.04.26.10.55 Scikit Learn MSE = 208.811126

Epoch 2/10 162315/162315 [==============================] - 14s 89us/step - loss: 4.5191 - mean_squared_error: 4.5191 - val_loss: 3.7627 - val_mean_squared_error: 3.7627

....

Epoch 00010: loss improved from 0.05314 to 0.05057, saving model to /home/Model/2019.04.26.10.55 Scikit Learn MSE = 0.484048

The MSE is calculated by Keras:

model.compile(loss='mse', optimizer='adam', metrics=['mse'])

The MSE is calculated by Scikit-Learn:

class my_callback(Callback): 
def __init__(self, trainX, trainY, model_name):
    self.trainset_X = trainX
    self.trainset_Y = trainY
    self.model_name = model_name
    self.previous_mse = 10000000

def on_train_begin(self, logs={}):
    return 

def on_train_end(self, logs={}):
    return

def on_epoch_begin(self, epoch, logs={}):
    return

def on_epoch_end(self, epoch, logs={}):        
    # ----- Train -----
    y_pred = self.model.predict(self.trainset_X, batch_size=64)
    curr_mse = mean_squared_error(self.trainset_Y, y_pred)
    print('Scikit Learn MSE = %f' % curr_mse)

    if curr_mse < self.previous_mse:
        print('Save the best model to %s' % self.model_name)
        self.model.save(self.model_name)
        self.previous_mse = curr_mse
    return

def on_batch_begin(self, batch, logs={}):
    return

def on_batch_end(self, batch, logs={}):
    return

Do you know why there is such different? I also checked the python codes of both of them, and they are quite similar.

keras-team / keras-applications

Why MSE calculated by Keras Compile is different from MSE calculated by Scikit-Learn? #105