Open zhulingchen opened 5 years ago
Also, I have seen the issue: https://github.com/tensorflow/probability/issues/282 so usually I will assign a weight to the loss, but that's another story.
What is the root cause of the None
gradient issue here?
After a few hours' investigation, I realize the None
gradient issue is caused by tf.enable_eager_execution()
. After I commented this line out, tf.keras.fit
can work.
What is the reason causing this?
Also, I realize that when TF Eager Execution is enabled, all BNN weight params are in the type of tf.Variable
but those grads corresponding to *-flipout layers are None
while the rest grads corresponding to other layers are in the type of tf.Tensor
; however, when TF Eager Execution is disabled, all BNN weight params are in the type of tf.Variable
and all corresponding grads are in the type of tf.Tensor
, without any None
gradients.
What is the mechanism?
This sounds like https://github.com/tensorflow/probability/issues/467,
which is fixed in the nightly pip packages. I tried your code with the
tfp-nightly
package, and it appeared to work fine, could you try this as
well?
On Tue, Jul 30, 2019 at 11:52 AM Zhu, Lingchen notifications@github.com wrote:
After a few hours' investigation, I realize the None gradient issue is caused by tf.enable_eager_execution(). After I commented out this line, keras.fit can work.
What is the reason causing this?
Also, I realize that when TF Eager Execution is enabled, all BNN weight params are in the type of tf.Variable but those grads corresponding to *-flipout layers are None while the rest grads corresponding to other layers are in the type of tf.Tensor; however, when TF Eager Execution is disabled, all BNN weight params are in the type of tf.Variable and corresponding grads are in the type of tf.Tensor.
What is the mechanism?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/511?email_source=notifications&email_token=AKPOBZ2QN7STKKLNCLMXUETQCCEVFA5CNFSM4IH5KH7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3E6GSY#issuecomment-516547403, or mute the thread https://github.com/notifications/unsubscribe-auth/AKPOBZ3YWNTY5KB2BAWW4EDQCCEVFANCNFSM4IH5KH7A .
Cool! I will try that ASAP.
By the way, why is there only tfp.layers.DenseVariational
but no tfp.layers.Convolution[123]DVariational
? Is it for some specific reason or are they still under development?
After having tf-nightly-gpu and tfp-nightly, I can train bcnn_model_1
.
However, for bcnn_model_2.fit
, I got the following error:
Train on 60000 samples
Epoch 1/30
2019-08-01 11:00:34.894546: W tensorflow/stream_executor/cuda/redzone_allocator.cc:311] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2019-08-01 11:00:34.948582: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: logits must be 2-D, but got shape [128,128,10]
[[{{node loss_2/distribution_lambda_loss/distribution_lambda_Categorical/log_prob/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
The summary of bcnn_model_2
is:
BCNN Model 2:
Model: "model_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 28, 28, 1)] 0
_________________________________________________________________
conv2d_flipout_2 (Conv2DFlip (None, 14, 14, 32) 608
_________________________________________________________________
batch_normalization_4 (Batch (None, 14, 14, 32) 128
_________________________________________________________________
activation_4 (Activation) (None, 14, 14, 32) 0
_________________________________________________________________
conv2d_flipout_3 (Conv2DFlip (None, 7, 7, 64) 36928
_________________________________________________________________
batch_normalization_5 (Batch (None, 7, 7, 64) 256
_________________________________________________________________
activation_5 (Activation) (None, 7, 7, 64) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 3136) 0
_________________________________________________________________
dense_flipout_2 (DenseFlipou (None, 512) 3211776
_________________________________________________________________
dense_flipout_3 (DenseFlipou (None, 10) 10250
_________________________________________________________________
distribution_lambda (Distrib ((None,), (None,)) 0
=================================================================
Total params: 3,259,946
Trainable params: 3,259,754
Non-trainable params: 192
_________________________________________________________________
So what is this problem about?
logits must be 2-D, but got shape [128,128,10] distribution_lambda_loss/ distribution_lambda_Categorical/ log_prob/ SparseSoftmaxCrossEntropyWithLogits/ SparseSoftmaxCrossEntropyWithLogits
initially i thought your problem was here, in model2 definition
model_out = DistributionLambda(lambda t: tfd.Categorical(logits=t))(x) # distribution
but the output shape of dense_flipout_3 is (None, 10) so now i think it's actually in your loss function
this looks like you intended to use it for bcnn:
def neg_log_likelihood_bayesian(y_true, y_pred):
labels_distribution = tfp.distributions.Categorical(logits=y_pred)
log_likelihood = labels_distribution.log_prob(tf.argmax(input=y_true, axis=1))
loss = -tf.reduce_mean(input_tensor=log_likelihood)
return loss
return neg_log_likelihood_bayesian
but then the model is compiled with neg_log_likelihood:
bcnn_model_2.compile(loss=neg_log_likelihood, optimizer=Adam(lr), metrics=['accuracy'])
def neg_log_likelihood(y_true, y_pred):
return -y_pred.log_prob(y_true)
something in here has a shape of 128, 128, 10. tfd.categorical or x or y_true or y_pred. that or it's a name collision in the loss function and you're accidentally calling the wrong one
Thanks for the detailed reply, bionicles.
Actually, I don't think I can use neg_log_likelihood_bayesian
as the loss function for bcnn_model_2
because the model output is a tfp.distributions
instead of logits, so I cannot feed tfp.distributions
to tfp.distributions.Categorical
again. That's why I define a much simpler but equivalent loss function neg_log_likelihood
that takes distributions as inputs for bcnn_model_2
.
I am still thinking there is something wrong of model_out = DistributionLambda(lambda t: tfd.Categorical(logits=t))(x)
in the function
def build_bayesian_cnn_model_2(input_shape):
model_in = Input(shape=input_shape)
x = Convolution2DFlipout(32, kernel_size=3, padding="same", strides=2)(model_in)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Convolution2DFlipout(64, kernel_size=3, padding="same", strides=2)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Flatten()(x)
x = DenseFlipout(512, activation='relu')(x)
x = DenseFlipout(10, activation=None)(x) # logits
model_out = DistributionLambda(lambda t: tfd.Categorical(logits=t))(x) # distribution
model = Model(model_in, model_out)
return model
with the loss function
def neg_log_likelihood(y_true, y_pred):
return -y_pred.log_prob(y_true)
but I don't know how to fix it as I see no similar examples.
By the way, y_train
and y_test
was not converted to one-hot encoded. They are still of shape (60000,)
and (10000,)
.
I found a similar issue at https://github.com/tensorflow/probability/issues/535 and as indicated, using tfd.Multinomial
instead of tfd.Categorical
can solve the logits shape problem. I still do not know why.
See also the related issue https://github.com/tensorflow/tensorflow/issues/33729.
@zhulingchen Have you solved the original problem which you said was being caused by tf.compat.v1.enable_eager_execution()
? In your notebook https://github.com/zhulingchen/tfp-tutorial/blob/master/tfp_bnn.ipynb, you're using tf.compat.v1.enable_eager_execution()
and you seem to be training a Bayesian CNN with Keras' fit, without getting the error you originally mentioned in this issue, so I suppose you solved the original issue. But how?
In this notebook, you used TensorFlow 1.15. Have you tried to use TensorFlow 2?
Can you please summarise which problems have you encountered while attempting to train a Bayesian CNN with Keras' APIs (i.e. fit
, compile
, etc.) and which of these problems have already been solved?
Hello, how are you? Did you solve your problem? I tried to adapt CNN Inception V3 with your model on GitHub. But unfortunately I got the following error:
ValueError: Variable <tf.Variable 'conv2d_flipout / kernel_posterior_loc: 0' shape = (3, 3, 3, 32) dtype = float32> has None
for gradient. Please make sure that all of your ops have a defined gradient (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
I ran the CNN algorithm you provided and it worked correctly. But when using with CNN Inception V3 the above error arises.
I thank you for your attention.
I am use tfp *-Flipout layers to construct a Bayesian neural network (BNN) and combine it with keras.fit to train. I am using a very similar way to define a BNN structure as a CNN but the keras.fit() function returns an issue about None gradient as
I am using the following versions of tfp and tf:
Below is a minimal working example on the MNIST dataset. Feel free to comment the working CNN part to see the BNN error above (either
bcnn_model_1
orbcnn_model_2
throws the above None gradient error when calling theirfit
functions):Any idea why is keras.fit() not able to work for such BNN models?
I also implemented a ResNet with tfp layers, as shown here: https://github.com/zhulingchen/tfp-resnet/blob/master/tfp_resnet.py. That really did work. So it starts to confuse me.