keras-team / keras-tuner

A Hyperparameter Tuning Library for Keras
https://keras.io/keras_tuner/
Apache License 2.0
2.85k stars 395 forks source link

Confusing (and incorrect) results_summary and error for weight initilisation #74

Closed amjass12 closed 4 years ago

amjass12 commented 5 years ago

Hi all,

I was wondering if someone could clarify a few things for me, i would be very greatful! :)

I have run keras tuner with the following code in order to optimise a model by unit number and layer number:

def build_model(hp):
    model = keras.Sequential()
    for i in range(hp.Int('num_layers', 2, 20)):
        model.add(layers.Dense(units=hp.Int('units_' + str(i),
                                            min_value=32,
                                            max_value=512,
                                            step=32),
                               activation='relu'))
    model.add(layers.Dense(24, activation='sigmoid'))
    model.compile(
        optimizer=keras.optimizers.Adam(
            hp.Choice('learning_rate', [1e-2, 1e-3, 1e-4])),
        loss='binary_crossentropy',
        metrics=['accuracy'])
    return model

tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=5,
    executions_per_trial=5,
    directory='/Users/blablabla/Desktop/',
    project_name='Optimal model')

tuner.search_space_summary()

tuner.search(X_train.values, y_train,
             epochs=50,
             validation_data=(X_test.values, y_test),
             shuffle=True
             )

This code runs without any issue! however.. when calling tuner.results_summary()

The results summary are completely of the actual training results (both during hyperparameter training but also if i just train the model without the hyperparamater (normal keras))

tuner.results_summary=
[Results summary]
 |-Results in /Users/jassim01/Desktop/CRUK Cambridge/R/Amir/Optimal model
 |-Ran 5 trials
 |-Ran 25 executions (5 per trial)
 |-Best val_accuracy: **0.5650**

the best val_acc is 0.56 and this is completely wrong according to the best performance i can achieve when running keras alone, or during the hyperparameter training. during hyperparameter training I am achieving (according to hyperparameter tuning progress report in excess of 90%[see one line as example])

is this is a bug or am i misintepreting something?

[CPU: 43%]Epoch 28/50: 100%|██████████| 5/5 [00:00<00:00, 61.46steps/s, loss=0.09, accuracy=0.969, val_loss=0.184, val_accuracy=0.913

Secondly, when calling best_model:

best_model = tuner.get_best_models(num_models=1)[0]

I get the error: ValueError: Weights for model sequential_2 have not yet been created. Weights are created when the Model is first called on inputs or build()

I understand this error, however none of the tutorials call model.build argument at the end of the for loop in the examples. Since the model is actually able to train, is this a bug?

if not.. is it ok to add an input_dim inside the for loop as follows?

model.add(layers.Dense(input_dim=5078, units=hp.Int('units_' + str(i), min_value=32, max_value=512, step=32),.. etc

If i do this, tuner.results_summary() shows a best val_acc of 0.90... and get_bestmodel also works....

just want to confirm this is correct..

Thanks for your help!!

jamlong commented 5 years ago

I get the error: ValueError: Weights for model sequential_2 have not yet been created. Weights are created when the Model is first called on inputs or build()

That's a separate issue - I'm moving it to issue 75: https://github.com/keras-team/keras-tuner/issues/75


if not.. is it ok to add an input_dim inside the for loop as follows?

What you are doing shouldn't cause any issues with the metrics tracking. At the end of the day, if the model compiles and runs, but the results being reported aren't matching the results of the trials, something has almost certainly gone wrong on the Kerastuner side.

I was able to get something reproducible from what you have above, and created a SSCCE (Short, Self Contained, Correct, Example) from which I'll create a regression test.

I've confirmed is that the sort for "val_accuracy" values appears to be sorting min-first, instead of max-first, which is not desirable, which is likely why you are getting lower / incorrect results.

jamlong commented 5 years ago

Also: a clarifier, as I look back at this - you should only be setting input_dim on your first layer - the rest of the dimensions are inferred from it.

amjass12 commented 5 years ago

Hi @jamlong

Thank you for your detailed response! I am happy this is reproducible and am guessing this is the process of being fixed.

Re: input data: I just want to clarify because it is something that has caused a lot of confusion for me;

the input_dim has to be in the first layer: does this mean that input_dim in the for loop as follows:

def build_model(hp): model = keras.Sequential() for i in range(hp.Int('num_layers', 2, 20)): model.add(layers.Dense(input_dim=5078,units=hp.Int('units_' + str(i), min_value=32, max_value=512, step=32), etc

is not correctly specified? in other words, it will treat the input dim as 5078 to each layer? My results_summary would suggest this isn't the case:

and with regards to build model: is issue #75 the workaround for not having an input_dim within the for loop? what does the input_dim as a layer before the for loop look like? sorry if this is a loaded question, i thinking i am misunderstanding something as if i try to input an input layer before the for loop, I am getting an error stating that units have to be stated in this layer...

thanks!

gabrieldemarmiesse commented 4 years ago

I believe this was fixed in 01d9422

amjass12 commented 4 years ago

Thank you! Just on a side note:

I found that if i indeed add a first layer to the model as follows:

def build_model(hp):
    model = keras.Sequential()
    **model.add(keras.layers.Input(shape=5078,))**
    for i in range(hp.Int('num_layers', 2, 20)):
        model.add(layers.Dense(units=hp.Int('units_' + str(i),
                                            min_value=32,
                                            max_value=512,
                                            step=32),
                               activation='relu'))
    model.add(layers.Dense(24, activation='sigmoid')) **etc**

the model runs fine, however summary also now outputs the correct best performing model highest accuracy first)...

omalleyt12 commented 4 years ago

@amjass12 Thanks! Going to close this issue as this seems to be fixed and/or split into other issues. Please reopen if you are still seeing the original error, or file a new issue if you are seeing a different error

b18062a commented 3 years ago

Hi @omalleyt12 and @jamlong, I believe the issue around logging minimum (instead of maximum) validation accuracy is back?

My JSON log file for each trial contains the minimum value for val_accuracy (which sounds identical to the issue above) and is quite different to the maximum I observe in the summaries being printed while testing. Have I stuffed up my code?

I believe I'm fully up-to-date with all my libraries, as I installed everything two week ago (was running on Google Colab before that).

This issue took a while to diagnose and I sadly don't have the know-how of how to fix it. I looked at fix #74 but couldn't understand it. Therefore, I'm currently ignoring val_accuracy in my HIP Plots due to this issue.

For reference, I'm leaning on this tutorial, https://medium.com/roonyx/neural-network-hyper-parameter-tuning-with-keras-tuner-and-hiplot-7637677821fa. Although I have made changes such as using a Bayesian optimiser instead of random search, and there's more layers - but I can' t imagine either of those is causing this.

Any help would be truly and deeply appreciated! :)

class CNNHyperModel(HyperModel):

    def __init__(self, input_shape, num_classes):
        self.input_shape = input_shape
        self.num_classes = num_classes

    def build(self, hp):
        model = tf.keras.Sequential()
        model.add(
            tf.keras.layers.Conv2D(
                filters=hp.Choice(
                    'num_filters_1',
                    values=[32, 64, 128],
                    default=64,
                ),
                kernel_size=3,
                activation='relu',
                input_shape=self.input_shape
            )
        )
        model.add(tf.keras.layers.MaxPooling2D(pool_size=2))
        model.add(
            tf.keras.layers.Dropout(
                rate=hp.Float(
                    'dropout_1',
                    min_value=0.0,
                    max_value=0.5,
                    default=0.25,
                    step=0.05
                )
            )
        )
        model.add(
            tf.keras.layers.Conv2D(
                filters=hp.Choice(
                    'num_filters_2',
                    values=[256, 512],
                    default=512, #was 64
                ),
                activation='relu',
                kernel_size=3
            )
        )
        model.add(tf.keras.layers.MaxPooling2D(pool_size=2))
        model.add(
            tf.keras.layers.Dropout(
                rate=hp.Float(
                    'dropout_2',
                    min_value=0.0,
                    max_value=0.1,
                    default=0.05,
                    step=0.025
                )
            )
        )

        model.add(
            tf.keras.layers.Conv2D(
                filters=hp.Choice(
                    'num_filters_3',
                    values=[128, 256, 512],
                    default=256, #was 64
                ),
                activation='relu',
                kernel_size=3
            )
        )
        model.add(tf.keras.layers.MaxPooling2D(pool_size=2))
        model.add(
            tf.keras.layers.Dropout(
                rate=hp.Float(
                    'dropout_3',
                    min_value=0.0,
                    max_value=0.1,
                    default=0.05,
                    step=0.025
                )
            )
        )

        model.add(
            tf.keras.layers.Conv2D(
                filters=hp.Choice(
                    'num_filters_4',
                    values=[128, 256, 512],
                    default=256, #was 64
                ),
                activation='relu',
                kernel_size=3
            )
        )
        model.add(tf.keras.layers.MaxPooling2D(pool_size=2))
        model.add(
            tf.keras.layers.Dropout(
                rate=hp.Float(
                    'dropout_4',
                    min_value=0.0,
                    max_value=0.25,
                    default=0.05,
                    step=0.05
                )
            )
        )

        model.add(tf.keras.layers.Flatten())
        model.add(
            tf.keras.layers.Dense(
                units=hp.Int(
                    'units',
                    min_value=256,
                    max_value=512,
                    step=32,
                    default=512
                ),
                activation=hp.Choice(
                    'dense_activation',
                    values=['relu'], #took out: 'sigmoid' and , 'tanh'
                    default='relu'
                )
            )
        )
        model.add(tf.keras.layers.Dense(self.num_classes, activation='softmax'))

        model.compile(
            optimizer=tf.keras.optimizers.Adam(
                hp.Float(
                    'learning_rate',
                    min_value=5e-5,
                    max_value=1e-3,
                    sampling='LOG',
                    default=1e-4
                )
            ),
            loss='categorical_crossentropy',
            metrics=['accuracy']
        )
        return model

hypermodel = CNNHyperModel(input_shape=(img_height, img_width, 1), num_classes=11)
max_trials = 120
project_name = '11class_v9c_130px_batchsize50_4Conv1Dense'

tuner = BayesianOptimization(
    hypermodel,
    objective='val_loss',
    seed=42,
    max_trials=max_trials,
    directory=LogStorage,
    project_name=project_name
)

tuner.search_space_summary()

num_of_epochs = 20
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
tuner.search(train_gen, epochs=num_of_epochs, validation_data=validation_gen, callbacks=[callback])
doyeljoseph commented 3 years ago

Hi , When I am trying to use te keras tuner for hyperparameter tuning for LSTM regression problem, I am getting a syntax error on the 4th layer. It is very akward that when i put the same layer on 3rd layer the syntax error is not shown and that implies that the syntax error is not on that particular line. Kindly some one help me out with the issue. Here is the code on the following lines.

def build_model(hp): model = keras.Sequential() model.add(LSTM(hp.Int("input_units",min_value = 16, max_value= 256, step = 16), activation='relu',input_shape = (trainX.shape[1],trainX.shape[2]),return_sequences = True))

for i in range (hp.Int("num_layers", 1,4)):
    model.add(LSTM(hp.Int(f"LSTM_{i}_units",min_value = 16, max_value= 256, step = 16),activation='relu',return_sequences = 
    False))
    model.add(Dropout(hp.Int(f"Dropout_{i}_units",min_value = 0.1, max_value= 0.9, step = 0.1))
    model.add( Dense(hp.Int(f"Dense_{i}_units",min_value = 1, max_value= 60, step = 2))

model.compile(optimizer=keras.optimizers.Adam(
        hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])),
        loss="categorical_crossentropy", metrics=["accuracy"])
model.summary()
return model

The error come on the 4th layer here in that case it comes on the Dense layer. Is it some common issue with keras tuner or is it some error in the syntax i have written.Kindly help me out