keras-team / keras-tuner

A Hyperparameter Tuning Library for Keras
https://keras.io/keras_tuner/
Apache License 2.0
2.86k stars 396 forks source link

loss exploding very high for keras-tuner regression problem #345

Open Palashio opened 4 years ago

Palashio commented 4 years ago

Using most recent version of keras-tuner.

Here is what my code looks like.


    def build_model(hp):
        model = keras.Sequential()
        for i in range(hp.Int('num_layers', min_layers, max_layers)):
            model.add(Dense(units=hp.Int('units_' + str(i),
                                         min_value=min_dense,
                                         max_value=max_dense,
                                         step=step),
                            activation=activation))
            model.add(Dropout(rate=hp.Float(
                              'dropout_3',
                              min_value=0.0,
                              max_value=0.5,
                              default=0.20,
                              step=0.05)))
        model.add(Dense(1, activation='linear'))
        model.compile(
            optimizer=keras.optimizers.Adam(
                                       hp.Float('learning_rate',
                                                min_value=1e-5,
                                                max_value=1e-2,
                                                sampling='LOG',
                                                default=1e-3)),
            loss='mse')
        return model

    # random search for the model
    tuner = RandomSearch(
        build_model,
        objective='loss',
        max_trials=max_trials,
        executions_per_trial=executions_per_trial,
        directory=directory)
    #tuner.search_space_summary()
    #del data[target]

    X_train, X_test, y_train, y_test = train_test_split(
        data, target, test_size=0.2, random_state=49)

    # searches the tuner space defined by hyperparameters (hp) and returns the
    # best model

    tuner.search(X_train, y_train,
                 epochs=epochs,
                 validation_data=(X_test, y_test),
                 callbacks=[tf.keras.callbacks.TensorBoard('my_dir')])

    models = tuner.get_best_models(num_models=1)[0]
    hyp = tuner.get_best_hyperparameters(num_trials=1)[0]

When I run this code, and then rebuild the tuning history using model = tuner.hypermodel.build(best_hps) I get incredibly high losses for some reason (1 million +). The Y values are numerical values (ranging from 100 - 100,000) and I'm trying to do a regression problem.

Based on my intuition I think I'm missing a layer / hyperparameter and it's causing keras-tuner to treat it like a classification problem for some reason.

Here is the code I'm using the rebuild the model with the best hyperparameters:

        model = tuner.hypermodel.build(best_hps)
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=test_size, random_state=49)
        history = model.fit(X_train, y_train,
                            epochs=epochs,
                            validation_data=(X_test, y_test),
                            verbose=verbose)
ben-arnao commented 4 years ago

My guess is this is not an issue with KT but just that the model built from the supplied hps is not very good. I've seen mse blow up before on some hps combinations, even for models that compile fine and maybe even look fine as well. If you really want to check if KT is involved, get the set of hps the loss blows up on and try training outside of KT.

Palashio commented 4 years ago

@ben-arnao thanks for responding so prompty. I've trained the model outside of KT and the loss is extremely low (0.43). Any other idea what might be happening? Specifically could you take a look at the build_model

Palashio commented 4 years ago

Maybe it has to do with not having a specific input / output layer?

ben-arnao commented 4 years ago

@Palashio maybe do a print(model.summary()) for the two models and compare?

I also notice you're using the old Keras API. Keras is in TF now and I believe KT only supports tensorflow.keras library, although it let's you compile your model so I'm not sure that will be the issue.

Palashio commented 4 years ago

Just tried replacing keras.Sequential with tf.keras.sequential and it doesn't seem to make a difference.

This is what the reconstructed model looks like.

image

Palashio commented 4 years ago

@ben-arnao Is the issue the "multiple" output shape? it's a single regression problem so it should only output one number.

ben-arnao commented 4 years ago

@Palashio I don't think that's an issue, i believe the just refers to the dimension of the data (batch_size, features), not the number of neurons.

I still think there is some sort of discrepancy between the hyper parameters that give you the high loss in KT and how you build/train your model manually outside of KT to test these hps for yourself.

Can you post the hyper parameters that are giving you the high loss along with how you are trying to manually recreate the model? It might be hard to help you further at this point without seeing a fully reproducible example.

Palashio commented 4 years ago

@ben-arnao I'll provide as much possible information as I can below:

This is the full code where I'm creating the tuner and then trying to recreate. The call to the manual creation is at the bottom.

    def build_model(hp):
        model = keras.Sequential()
        for i in range(hp.Int('num_layers', min_layers, max_layers)):
            model.add(Dense(units=hp.Int('units_' + str(i),
                                         min_value=min_dense,
                                         max_value=max_dense,
                                         step=step),
                            activation=activation))
            model.add(Dropout(rate=hp.Float(
                              'dropout_3',
                              min_value=0.0,
                              max_value=0.5,
                              default=0.20,
                              step=0.05)))
        model.add(Dense(1, activation='linear'))
        model.compile(
            optimizer=keras.optimizers.Adam(
                                       hp.Float('learning_rate',
                                                min_value=1e-5,
                                                max_value=1e-2,
                                                sampling='LOG',
                                                default=1e-3)),
            loss='mse',
            metrics=['accuracy'])
        return model

    # random search for the model
    tuner = RandomSearch(
        build_model,
        objective='loss',
        max_trials=max_trials,
        executions_per_trial=executions_per_trial,
        directory=directory)
    # tuner.search_space_summary()
    # del data[target]

    X_train, X_test, y_train, y_test = train_test_split(
        data, target, test_size=0.2, random_state=49)

    # searches the tuner space defined by hyperparameters (hp) and returns the
    # best model

    tuner.search(X_train, y_train,
                 epochs=epochs,
                 validation_data=(X_test, y_test),
                 callbacks=[tf.keras.callbacks.TensorBoard('my_dir')])

    models = tuner.get_best_models(num_models=1)[0]
    hyp = tuner.get_best_hyperparameters(num_trials=1)[0]
    history = tuner_hist(
        data,
        target,
        tuner,
        hyp,
        epochs=epochs,
        verbose=verbose,
        test_size=test_size)
    """
    Return:
        models[0] : best model obtained after tuning
        best_hps : best Hyperprameters obtained after tuning, stored as map
        history : history of the data executed from the given model
    """
    return models, hyp, history, X_test, y_test

This is how I'm manually recreating the model: Note that when this is called img is equal to 0.

def tuner_hist(
        X,
        y,
        tuner,
        best_hps,
        img=0,
        epochs=5,
        test_size=0.2,
        verbose=0):
    model = tuner.hypermodel.build(best_hps)

    if img == 0:
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=test_size, random_state=49)
        history = model.fit(X_train, y_train,
                            epochs=epochs,
                            validation_data=(X_test, y_test),
                            verbose=verbose)
    else:
        history = model.fit_generator(X,
                                      epochs=epochs,
                                      validation_data=y,
                                      verbose=verbose)

    return history

As for the hyperparameters this is an example of what the value of the best hyperparameters are:

This is the model.summary() of the reconstructed model. image

These were the hyp.values that were passed to it: {'num_layers': 5, 'units_0': 384, 'dropout_3': 0.15000000000000002, 'units_1': 160, 'learning_rate': 0.0008846517204233912, 'units_2': 32, 'units_3': 32, 'units_4': 32}

Palashio commented 4 years ago

Additionally, if it helps, I'm currently using it for this dataset: https://www.kaggle.com/camnugent/california-housing-prices and trying to tune a neural network on the median_house_value column. I've already preprocessed all the other columns so that there aren't any categorical ones.

ben-arnao commented 4 years ago

@Palashio I tried recreating the example but there are too many pieces you didn't include.

I still think this is definitely something up with your custom code, not with KT. As i explained before, there are times when mse will explode based on the model that is built and how it is trained, it is nothing that KT is doing wrong however.

This is why i ask you to emulate building the model/training from scratch the same exact way using the problematic hps so that you can see the issue is reproducible without KT involved, it will help you understand how parameterization and modeling building works in KT.

Just a few things i also see at a quick glance... you are trying to use accuracy as a metric in your optimizer. You are also using a variable called dropout_3 in every one of your layers. Lastly i am seeing that "activation" is not defined here, just another variable that can cause issues if not kept constant while trying to track the issue down.

If you're still having trouble going to need a minimal fully reproducible example to look at this further.

Palashio commented 4 years ago

@ben-arnao Here is a link to a google collab workspace where the issue is fully reproduced: https://colab.research.google.com/drive/1sTVMJRXVF5gsQGPQKHtGk9RNV2mlijtc?usp=sharing.

The two datasets are exactly what is being passed into the tuning function after preprocessing. It is also exactly what is being passed to a regular model, in which 0.2 losses are attained. If the data isn't loaded on your end here is a link to a google drive folder with both of the files in there: https://drive.google.com/drive/folders/1wq72fY6hz2n9oYL0hqMGwydZmsl_hEI6?usp=sharing.

ben-arnao commented 4 years ago

@Palashio Here we are able to see that this behavior is replicated outside of KT. So this doesn't seem like an issue with KT but just general nature of your data and model choices that causes loss to blow up. Maybe the data isn't scaled first? I'm not too familiar with the specific of your experiment but i would work on getting a stable loss without kerastuner involved and then maybe give it another shot.

from tensorflow.keras.layers import Dense
from tensorflow import keras
from sklearn.model_selection import train_test_split
import tensorflow as tf
import pandas as pd

data = pd.read_csv('housing1.csv')
target = pd.read_csv('target.csv')

X_train, X_test, y_train, y_test = train_test_split(
    data, target, test_size=0.2, random_state=49)

model = tf.keras.Sequential()
model.add(Dense(32,
                activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss='mse')

history = model.fit(X_train, y_train,
                    epochs=5,
                    validation_data=(X_test, y_test),
                    verbose=1)