tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.27k stars 1.11k forks source link

model based on TensorFlow Probability with keras.fit #511

Open zhulingchen opened 5 years ago

zhulingchen commented 5 years ago

I am use tfp *-Flipout layers to construct a Bayesian neural network (BNN) and combine it with keras.fit to train. I am using a very similar way to define a BNN structure as a CNN but the keras.fit() function returns an issue about None gradient as

ValueError: Variable <tf.Variable 'conv2d_flipout/kernel_posterior_loc:0' shape=(3, 3, 1, 32) dtype=float32> has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

I am using the following versions of tfp and tf:

tfp.__version__ == '0.7.0'

tf.__version__ == '1.14.0'

Below is a minimal working example on the MNIST dataset. Feel free to comment the working CNN part to see the BNN error above (either bcnn_model_1 or bcnn_model_2 throws the above None gradient error when calling their fit functions):

import os
os.environ['KERAS_BACKEND'] = 'tensorflow'  # set up tensorflow backend for keras

import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow_probability.python.layers import DenseVariational, DenseReparameterization, DenseFlipout, Convolution2DFlipout, Convolution2DReparameterization
from tensorflow_probability.python.layers import DistributionLambda
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Input, Dense, Conv2D, Flatten, BatchNormalization, Activation, LeakyReLU
from tensorflow.keras.utils import plot_model
from tensorflow.keras.optimizers import *

tf.enable_eager_execution()
tfd = tfp.distributions

import numpy as np
import matplotlib.pyplot as plt

def neg_log_likelihood(y_true, y_pred):
    return -y_pred.log_prob(y_true)

def get_neg_log_likelihood_fn(bayesian=False):
    """
    Get the negative log-likelihood function
    # Arguments
        bayesian(bool): Bayesian neural network (True) or point-estimate neural network (False)

    # Returns
        a negative log-likelihood function
    """
    if bayesian:
        def neg_log_likelihood_bayesian(y_true, y_pred):
            labels_distribution = tfp.distributions.Categorical(logits=y_pred)
            log_likelihood = labels_distribution.log_prob(tf.argmax(input=y_true, axis=1))
            loss = -tf.reduce_mean(input_tensor=log_likelihood)
            return loss
        return neg_log_likelihood_bayesian
    else:
        def neg_log_likelihood(y_true, y_pred):
            y_pred_softmax = keras.layers.Activation('softmax')(y_pred)  # logits to softmax
            loss = keras.losses.categorical_crossentropy(y_true, y_pred_softmax)
            return loss
        return neg_log_likelihood

n_class = 10

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = np.expand_dims(X_train, -1)
n_train = X_train.shape[0]
X_test = np.expand_dims(X_test, -1)
n_test = X_test.shape[0]

# Normalize data
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

print("X_train.shape =", X_train.shape)
print("y_train.shape =", y_train.shape)
print("X_test.shape =", X_test.shape)
print("y_test.shape =", y_test.shape)

plt.imshow(X_train[0, :, :, 0], cmap='gist_gray')

lr = 1e-3

def build_cnn_model(input_shape):
    model_in = Input(shape=input_shape)
    x = Conv2D(32, kernel_size=3, padding="same", strides=2)(model_in)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(64, kernel_size=3, padding="same", strides=2)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Flatten()(x)
    x = Dense(512, activation='relu')(x)
    model_out = Dense(10, activation='softmax')(x)  # softmax
    model = Model(model_in, model_out)
    return model

def build_bayesian_cnn_model_1(input_shape):
    model_in = Input(shape=input_shape)
    x = Convolution2DFlipout(32, kernel_size=3, padding="same", strides=2)(model_in)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Convolution2DFlipout(64, kernel_size=3, padding="same", strides=2)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Flatten()(x)
    x = DenseFlipout(512, activation='relu')(x)
    model_out = DenseFlipout(10, activation=None)(x)  # logits
    model = Model(model_in, model_out)
    return model

def build_bayesian_cnn_model_2(input_shape):
    model_in = Input(shape=input_shape)
    x = Convolution2DFlipout(32, kernel_size=3, padding="same", strides=2)(model_in)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Convolution2DFlipout(64, kernel_size=3, padding="same", strides=2)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Flatten()(x)
    x = DenseFlipout(512, activation='relu')(x)
    x = DenseFlipout(10, activation=None)(x)  # logits
    model_out = DistributionLambda(lambda t: tfd.Categorical(logits=t))(x)  # distribution
    model = Model(model_in, model_out)
    return model

cnn_model = build_cnn_model(X_train.shape[1:])
cnn_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam(lr), metrics=['accuracy'])
print('CNN Model:')
cnn_model.summary()

bcnn_model_1 = build_bayesian_cnn_model_1(X_train.shape[1:])
bcnn_model_1.compile(loss=get_neg_log_likelihood_fn(bayesian=True), optimizer=Adam(lr), metrics=['accuracy'])
print("BCNN Model 1:")
bcnn_model_1.summary()

bcnn_model_2 = build_bayesian_cnn_model_2(X_train.shape[1:])
bcnn_model_2.compile(loss=neg_log_likelihood, optimizer=Adam(lr), metrics=['accuracy'])
print("BCNN Model 2:")
bcnn_model_2.summary()

batch_size = 128
n_epochs = 30
hist_cnn = cnn_model.fit(X_train, y_train, batch_size=batch_size, epochs=n_epochs, verbose=1)
hist_bcnn_1 = bcnn_model_1.fit(X_train, y_train, batch_size=batch_size, epochs=n_epochs, verbose=1)
hist_bcnn_2 = bcnn_model_2.fit(X_train, y_train, batch_size=batch_size, epochs=n_epochs, verbose=1)

Any idea why is keras.fit() not able to work for such BNN models?

I also implemented a ResNet with tfp layers, as shown here: https://github.com/zhulingchen/tfp-resnet/blob/master/tfp_resnet.py. That really did work. So it starts to confuse me.

zhulingchen commented 5 years ago

Also, I have seen the issue: https://github.com/tensorflow/probability/issues/282 so usually I will assign a weight to the loss, but that's another story.

What is the root cause of the None gradient issue here?

zhulingchen commented 5 years ago

After a few hours' investigation, I realize the None gradient issue is caused by tf.enable_eager_execution(). After I commented this line out, tf.keras.fit can work.

What is the reason causing this?

Also, I realize that when TF Eager Execution is enabled, all BNN weight params are in the type of tf.Variable but those grads corresponding to *-flipout layers are None while the rest grads corresponding to other layers are in the type of tf.Tensor; however, when TF Eager Execution is disabled, all BNN weight params are in the type of tf.Variable and all corresponding grads are in the type of tf.Tensor, without any None gradients.

What is the mechanism?

SiegeLordEx commented 5 years ago

This sounds like https://github.com/tensorflow/probability/issues/467, which is fixed in the nightly pip packages. I tried your code with the tfp-nightly package, and it appeared to work fine, could you try this as well?

On Tue, Jul 30, 2019 at 11:52 AM Zhu, Lingchen notifications@github.com wrote:

After a few hours' investigation, I realize the None gradient issue is caused by tf.enable_eager_execution(). After I commented out this line, keras.fit can work.

What is the reason causing this?

Also, I realize that when TF Eager Execution is enabled, all BNN weight params are in the type of tf.Variable but those grads corresponding to *-flipout layers are None while the rest grads corresponding to other layers are in the type of tf.Tensor; however, when TF Eager Execution is disabled, all BNN weight params are in the type of tf.Variable and corresponding grads are in the type of tf.Tensor.

What is the mechanism?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/511?email_source=notifications&email_token=AKPOBZ2QN7STKKLNCLMXUETQCCEVFA5CNFSM4IH5KH7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3E6GSY#issuecomment-516547403, or mute the thread https://github.com/notifications/unsubscribe-auth/AKPOBZ3YWNTY5KB2BAWW4EDQCCEVFANCNFSM4IH5KH7A .

zhulingchen commented 5 years ago

Cool! I will try that ASAP.

By the way, why is there only tfp.layers.DenseVariational but no tfp.layers.Convolution[123]DVariational? Is it for some specific reason or are they still under development?

zhulingchen commented 5 years ago

After having tf-nightly-gpu and tfp-nightly, I can train bcnn_model_1.

However, for bcnn_model_2.fit, I got the following error:

Train on 60000 samples
Epoch 1/30
2019-08-01 11:00:34.894546: W tensorflow/stream_executor/cuda/redzone_allocator.cc:311] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2019-08-01 11:00:34.948582: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: logits must be 2-D, but got shape [128,128,10]
     [[{{node loss_2/distribution_lambda_loss/distribution_lambda_Categorical/log_prob/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]

The summary of bcnn_model_2 is:

BCNN Model 2:
Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_flipout_2 (Conv2DFlip (None, 14, 14, 32)        608       
_________________________________________________________________
batch_normalization_4 (Batch (None, 14, 14, 32)        128       
_________________________________________________________________
activation_4 (Activation)    (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_flipout_3 (Conv2DFlip (None, 7, 7, 64)          36928     
_________________________________________________________________
batch_normalization_5 (Batch (None, 7, 7, 64)          256       
_________________________________________________________________
activation_5 (Activation)    (None, 7, 7, 64)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 3136)              0         
_________________________________________________________________
dense_flipout_2 (DenseFlipou (None, 512)               3211776   
_________________________________________________________________
dense_flipout_3 (DenseFlipou (None, 10)                10250     
_________________________________________________________________
distribution_lambda (Distrib ((None,), (None,))        0         
=================================================================
Total params: 3,259,946
Trainable params: 3,259,754
Non-trainable params: 192
_________________________________________________________________

So what is this problem about?

bionicles commented 5 years ago

logits must be 2-D, but got shape [128,128,10] distribution_lambda_loss/ distribution_lambda_Categorical/ log_prob/ SparseSoftmaxCrossEntropyWithLogits/ SparseSoftmaxCrossEntropyWithLogits

initially i thought your problem was here, in model2 definition model_out = DistributionLambda(lambda t: tfd.Categorical(logits=t))(x) # distribution

but the output shape of dense_flipout_3 is (None, 10) so now i think it's actually in your loss function

this looks like you intended to use it for bcnn:

        def neg_log_likelihood_bayesian(y_true, y_pred):
            labels_distribution = tfp.distributions.Categorical(logits=y_pred)
            log_likelihood = labels_distribution.log_prob(tf.argmax(input=y_true, axis=1))
            loss = -tf.reduce_mean(input_tensor=log_likelihood)
            return loss
        return neg_log_likelihood_bayesian

but then the model is compiled with neg_log_likelihood:

bcnn_model_2.compile(loss=neg_log_likelihood, optimizer=Adam(lr), metrics=['accuracy'])

def neg_log_likelihood(y_true, y_pred):
    return -y_pred.log_prob(y_true)

something in here has a shape of 128, 128, 10. tfd.categorical or x or y_true or y_pred. that or it's a name collision in the loss function and you're accidentally calling the wrong one

zhulingchen commented 5 years ago

Thanks for the detailed reply, bionicles.

Actually, I don't think I can use neg_log_likelihood_bayesian as the loss function for bcnn_model_2 because the model output is a tfp.distributions instead of logits, so I cannot feed tfp.distributions to tfp.distributions.Categorical again. That's why I define a much simpler but equivalent loss function neg_log_likelihood that takes distributions as inputs for bcnn_model_2.

zhulingchen commented 5 years ago

I am still thinking there is something wrong of model_out = DistributionLambda(lambda t: tfd.Categorical(logits=t))(x) in the function

def build_bayesian_cnn_model_2(input_shape):
    model_in = Input(shape=input_shape)
    x = Convolution2DFlipout(32, kernel_size=3, padding="same", strides=2)(model_in)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Convolution2DFlipout(64, kernel_size=3, padding="same", strides=2)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Flatten()(x)
    x = DenseFlipout(512, activation='relu')(x)
    x = DenseFlipout(10, activation=None)(x)  # logits
    model_out = DistributionLambda(lambda t: tfd.Categorical(logits=t))(x)  # distribution
    model = Model(model_in, model_out)
    return model

with the loss function

def neg_log_likelihood(y_true, y_pred):
    return -y_pred.log_prob(y_true)

but I don't know how to fix it as I see no similar examples.

By the way, y_train and y_test was not converted to one-hot encoded. They are still of shape (60000,) and (10000,).

zhulingchen commented 5 years ago

I found a similar issue at https://github.com/tensorflow/probability/issues/535 and as indicated, using tfd.Multinomial instead of tfd.Categorical can solve the logits shape problem. I still do not know why.

nbro commented 5 years ago

See also the related issue https://github.com/tensorflow/tensorflow/issues/33729.

nbro commented 5 years ago

@zhulingchen Have you solved the original problem which you said was being caused by tf.compat.v1.enable_eager_execution()? In your notebook https://github.com/zhulingchen/tfp-tutorial/blob/master/tfp_bnn.ipynb, you're using tf.compat.v1.enable_eager_execution() and you seem to be training a Bayesian CNN with Keras' fit, without getting the error you originally mentioned in this issue, so I suppose you solved the original issue. But how?

In this notebook, you used TensorFlow 1.15. Have you tried to use TensorFlow 2?

Can you please summarise which problems have you encountered while attempting to train a Bayesian CNN with Keras' APIs (i.e. fit, compile, etc.) and which of these problems have already been solved?

gledsonmelotti commented 5 years ago

Hello, how are you? Did you solve your problem? I tried to adapt CNN Inception V3 with your model on GitHub. But unfortunately I got the following error: ValueError: Variable <tf.Variable 'conv2d_flipout / kernel_posterior_loc: 0' shape = (3, 3, 3, 32) dtype = float32> has None for gradient. Please make sure that all of your ops have a defined gradient (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

I ran the CNN algorithm you provided and it worked correctly. But when using with CNN Inception V3 the above error arises.

I thank you for your attention.

waldnerf commented 3 years ago

This post worked for me