keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.88k stars 19.45k forks source link

Option for using dropout in the predict phase (as an approximation to Bayesian DL) #9412

Closed franciscovargas closed 6 years ago

franciscovargas commented 6 years ago

As mentioned in issue #5357 (https://github.com/keras-team/keras/issues/5357#issuecomment-350276900) by @spearsem and @alexchao56 it would be nice if we could enable dropout in the prediction stage of the model and not just in training.

There is solid work motivating this use case as an approximation to Bayesian deep learning http://proceedings.mlr.press/v48/gal16.pdf (in this case as a variational approximation to deep GPs).

Ideally one would be able to run predict multiple times and use the expected value of these predictions as an estimate of the overall prediction and its std to quantify the uncertainty around the prediction.

Other than the feature request, is there a way to possibly go around the current setup in Keras to achieve this ?

franciscovargas commented 6 years ago

potential work around

import keras.backend as K
# for some model with dropout ...
f = K.function([model.layers[0].input, K.learning_phase()],
               [model.layers[-1].output])

def predict_with_uncertainty(f, x, no_classes, n_iter=100):
    result = np.zeros((n_iter,) + (x.shape[0], no_classes) )

    for i in range(n_iter):
        result[i,:, :] = f((x, 1))[0]

    prediction = result.mean(axis=0)
    uncertainty = result.std(axis=0)
    return prediction, uncertainty    
JamesAllingham commented 6 years ago

@franciscovargas that work around seems to be correct since it was used by Gal in the implementation for the experiments of the paper Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. See the implementation here.

Still would be nice to have this build into Keras so that it works nicely with the model predict functions.

franciscovargas commented 6 years ago

Thanks, I wish I had seen that earlier on today :D ...

fchollet commented 6 years ago

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)
franciscovargas commented 6 years ago

Maybe worth adding to the docs and saving more questions asked in the future since I can't see it in core layers for dropout. No such param is mentioned. It was not immediately clear for me when reading the source that the training flag was for this.

https://keras.io/layers/core/

sanchezismael commented 6 years ago

In the implementation with the Training = True parameter in layer dropout, are the values scale in the training phase? Are the values scale in the prediction phase? I am not sure about what the parameter Training=True is doing.

grantwwoodford commented 6 years ago

@franciscovargas Your method works for me but it seems to cause a memory leak. #10338

chjq201410695 commented 5 years ago

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

when I use lstm(recurrent_dropout=0.5), and I want keep the recurrent_dropout in test phase. is the following code right?

import keras

inputs = keras.Input(shape=(10,)) x = keras.layers.LSTM(10,recurrent_dropout=0.5)(inputs, training=True) x = keras.layers.Dense(3)(x) outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

sergehijo commented 5 years ago

@fchollet thanks a lot !!! works like a charm

abhinav-upadhyay commented 5 years ago

Does the training=True option work with LSTM layers with recurrent_dropout as well?

romanovzky commented 5 years ago

This doesn't seem to work with SpatialDropout laters, any suggestions?

fccoelho commented 5 years ago

Great thread, but how can I use training=true in the Sequential API? for example

model = Sequential()
model.add(LSTM(...))
Model.add(Dropout(0.2))
...

is this documented anywhere?

alxhrzg commented 5 years ago

Great thread, but how can I use training=true in the Sequential API? for example

model = Sequential()
model.add(LSTM(...))
Model.add(Dropout(0.2))
...

is this documented anywhere?

I've just stumbled accross the same problem. The general question is how to override keras call-methods to toggle between call-methodology and the classical Sequential-API. My hacky quickfix was to inherit from the keras.layers.Dropout class and overwrite its call-method. In additon I added the kwarg training=True to the __init__-method before calling super with the arguments expected by the base-class.

class Dropout(keras.layers.Dropout):
    """Applies Dropout to the input.
    Dropout consists in randomly setting
    a fraction `rate` of input units to 0 at each update during training time,
    which helps prevent overfitting.
    # Arguments
        rate: float between 0 and 1. Fraction of the input units to drop.
        noise_shape: 1D integer tensor representing the shape of the
            binary dropout mask that will be multiplied with the input.
            For instance, if your inputs have shape
            `(batch_size, timesteps, features)` and
            you want the dropout mask to be the same for all timesteps,
            you can use `noise_shape=(batch_size, 1, features)`.
        seed: A Python integer to use as random seed.
    # References
        - [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](
           http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf)
    """
    def __init__(self, rate, training=None, noise_shape=None, seed=None, **kwargs):
        super(Dropout, self).__init__(rate, noise_shape=None, seed=None,**kwargs)
        self.training = training

    def call(self, inputs, training=None):
        if 0. < self.rate < 1.:
            noise_shape = self._get_noise_shape(inputs)

            def dropped_inputs():
                return K.dropout(inputs, self.rate, noise_shape,
                                 seed=self.seed)
            if not training: 
                return K.in_train_phase(dropped_inputs, inputs, training=self.training)
            return K.in_train_phase(dropped_inputs, inputs, training=training)
        return inputs

Now you can just pass the argument when adding layers via the Sequential API, such as:

model.add(keras.layers.Dense(512, activation="relu"))
model.add(Dropout(rate=0.5, training=True))
model.add(keras.layers.Dense(256, activation="relu"))
model.add(Dropout(rate=0.5, training=True))
model.add(keras.layers.Dense(2, activation="softmax"))
arjangroen commented 5 years ago

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

Can you also switch back to the non-dropout prediction after compiling? Or is it compiled in and do you need to make a separate model and transfer the weights?

kpelechrinis commented 5 years ago

@franciscovargas thanks for the workaround.

One question I have is if Keras rescale the weights during test phase when dropout is 'enabled'. Theoretically the average you obtain from the MC dropout should be similar with the prediction you get when you use all the connections for the same input. However, in my case the output from MC dropout is always much smaller than the prediction with out dropout.

qiuriyi commented 4 years ago

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

@fchollet If I use training=True to enable the Dropout, is it possible to turn it off in the testing phase when necessary?

MalteEbner commented 4 years ago

potential work around

import keras.backend as K
# for some model with dropout ...
f = K.function([model.layers[0].input, K.learning_phase()],
               [model.layers[-1].output])

def predict_with_uncertainty(f, x, no_classes, n_iter=100):
    result = np.zeros((n_iter,) + (x.shape[0], no_classes) )

    for i in range(n_iter):
        result[i,:, :] = f((x, 1))[0]

    prediction = result.mean(axis=0)
    uncertainty = result.std(axis=0)
    return prediction, uncertainty    

The workaround fails (error in defining K.function) due to the issue mentioned in https://github.com/tensorflow/tensorflow/issues/34201

kaibrach commented 4 years ago

@MalteEbner : See my suggestion here: https://github.com/tensorflow/tensorflow/issues/34201#issuecomment-577596280

gieses commented 4 years ago

Has anything changed in tf now? I am getting the same predictions with the suggested snippet.

potential work around

import keras.backend as K
# for some model with dropout ...
f = K.function([model.layers[0].input, K.learning_phase()],
               [model.layers[-1].output])

def predict_with_uncertainty(f, x, no_classes, n_iter=100):
    result = np.zeros((n_iter,) + (x.shape[0], no_classes) )

    for i in range(n_iter):
        result[i,:, :] = f((x, 1))[0]

    prediction = result.mean(axis=0)
    uncertainty = result.std(axis=0)
    return prediction, uncertainty    

The workaround fails (error in defining K.function) due to the issue mentioned in tensorflow/tensorflow#34201

jHaselberger commented 4 years ago

@gieses I was wondering too. Uncertainty is always zero

dougzec commented 4 years ago

There is this feature in Keras: it's the training argument in the call of the Dropout layer. Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

when I use lstm(recurrent_dropout=0.5), and I want keep the recurrent_dropout in test phase. is the following code right?

import keras

inputs = keras.Input(shape=(10,)) x = keras.layers.LSTM(10,recurrent_dropout=0.5)(inputs, training=True) x = keras.layers.Dense(3)(x) outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

Did you figure it out?

hjliag commented 4 years ago

http://www.cs.ox.ac.uk/people/yarin.gal/website/blog_2248.html

As mentioned in this blog written by the inventor of MC dropout, fixing the dropped weights for all test inputs make better visualization.

Does anyone have a solution for fixing the dropout weights using the keras dropout?

cgebbe commented 3 years ago

Old thread, but another solution is a LayerWrapper. This turned out useful in my case

class AlwaysInTrain(tf.keras.layers.Wrapper):
    def __call__(self, inputs, *args, **kwargs):
        return self.layer(inputs, *args, **kwargs, training=True)

# use as followed
x = AlwaysInTrain(tf.keras.layers.Dropout(0.5))(x)
samanemami commented 3 years ago

potential work around

import keras.backend as K
# for some model with dropout ...
f = K.function([model.layers[0].input, K.learning_phase()],
               [model.layers[-1].output])

def predict_with_uncertainty(f, x, no_classes, n_iter=100):
    result = np.zeros((n_iter,) + (x.shape[0], no_classes) )

    for i in range(n_iter):
        result[i,:, :] = f((x, 1))[0]

    prediction = result.mean(axis=0)
    uncertainty = result.std(axis=0)
    return prediction, uncertainty    

I am trying to use keras.backend but I received the following error

ValueError: Input tensors to a Functional must come from `tf.keras.Input`. 
 Received: 0 (missing previous layer metadata).

Could anyone please help me with this issue?

icarmi commented 3 years ago

I find that with Tensorflow version 2.5, it's much easier. Just call the model like this:

model(X, training=True)

That's it! (This also works for models that were loaded from disk)

uranusx86 commented 2 years ago

I find that with Tensorflow version 2.5, it's much easier. Just call the model like this:

model(X, training=True)

That's it! (This also works for models that were loaded from disk)

it also work in TF 2.3.

anton-brandl commented 2 years ago

I'm a bit sceptical about the proposed solutions to enable training mode for the entire network and not just for the dropout layers. My understanding is that this means other layers will be affected as well which might have side-effects: If BatchNorm is activated during MC inference, you would update the layer statistics every single time you run the forward pass. So the only correct solutions here here are to only modify the dropout layers. The other solutions only work for networks without batchnorm. Please correct me if I'm wrong!