Open nbro opened 4 years ago
I drilled down in the TensorFlow code. It's due to the automatic TensorFlow creating an automatic wrapper around your function. It casts and reshapes the model output (the distribution) to the type of the metric (which seems odd to me anyways). So, to prevent it, you should create your own wrapper, that doesn't perform this cast. The code that does this, is at: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/metrics.py#L583
So inspire yourself on that block of code to make your own Metric Wrapper. This should be a feature of TFP.
@mcourteaux Thank you for this info (I was already suspecting this, btw). Have a look at the duplicate issue https://github.com/tensorflow/tensorflow/issues/36181. You should also provide this info there. Feel free to provide a (temporary) concrete solution to this problem.
For example, one can use this MeanMetricWrapper:
class MeanMetricWrapper(keras.metrics.Mean):
def __init__(self, fn, name=None, dtype=None, **kwargs):
super(MeanMetricWrapper, self).__init__(name=name, dtype=dtype)
self._fn = fn
self._fn_kwargs = kwargs
def update_state(self, y_true, y_pred, sample_weight=None):
matches = self._fn(y_true, y_pred, **self._fn_kwargs)
return super(MeanMetricWrapper, self).update_state(
matches, sample_weight=sample_weight)
def get_config(self):
config = {}
for k, v in six.iteritems(self._fn_kwargs):
config[k] = K.eval(v) if is_tensor_or_variable(v) else v
base_config = super(MeanMetricWrapper, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
And when defining your metrics, use this one and pass your original lambda into the fn argument of the constructor.
@brianwa84, @jvdillon Can you please confirm (or not) that the solution provided @mcourteaux is the most appropriate workaround that currently exists?
I've trained a Bayesian neural network by early stopping it when the negative log-likelihood does not improve for several epochs. To do that, I pass my negative log-likelihood loss function (similar to the one defined above) to MeanMetricWrapper
, i.e. MeanMetricWrapper(neg_log_likelihood)
, and then pass it to the metrics
parameter of the compile
method. Given that I am not very familiar with MeanMetricWrapper
(and, in general, all metric classes and how they work or are supposed to be used) and given that training of my Bayesian model early stops after 5-6 epochs, I think I am doing something wrong, given that I wouldn't expect the Bayesian model to overfit so easily.
@nbro are you still having issues? This workaround does seem to be doing the job for me (in particular, it agrees with the loss function value up to some difference I can attribute to the kernel divergence).
I would still like to figure out how to extract the KL divergence separately, but that might be harder since it depends on the weight distributions.
@joaocaldeira Well, as I say in the comment above, I was trying to use the NLL computed with the workaround above to early stop my model, but the model early stops too quickly (although my datasets aren't relatively large, i.e. 8k training and 8k test instances), i.e. after 5-6 epochs, whereas the non-Bayesian model doesn't even early stop. I was actually expecting the Bayesian NN not to overfit so quickly, but Bayesian models can also overfit, but people usually say that they are more robust to overfitting, although I don't know precisely what they mean by that.
@joaocaldeira Would it be possible to see your source code? Have you also encountered an overfitting situation?
No obvious overfitting, or particularly quick stopping. I'm not particularly happy with the uncertainties I get from the model at the moment, but that's a completely separate issue. The relevant code snippet is
def mlp_flipout(hidden_dim=100, n_layers=3, n_inputs=13, dropout_rate=0, kernel='kl'):
input_img = tfkl.Input(n_inputs)
x = input_img
if kernel == 'kl':
kernel_fn = scaled_kl_fn
elif kernel == 'mmd':
kernel_fn = mmd_from_dists
else:
raise ValueError(f'Kernel {kernel} not defined!')
for _ in range(n_layers):
x = tfpl.DenseFlipout(hidden_dim, activation='relu', kernel_divergence_fn=kernel_fn)(x)
if dropout_rate > 0:
x = tfkl.Dropout(dropout_rate)(x)
x = tfpl.DenseFlipout(2, kernel_divergence_fn=kernel_fn)(x)
x = tfpl.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(t[..., 1:])))(x)
model = tfk.Model(input_img, x)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=1e-4), loss=negloglik,
metrics=['mse', MeanMetricWrapper(negloglik_met, name='nll')])
return model
using the MeanMetricWrapper defined above.
@joaocaldeira I think I did exactly the same thing. What's the size of your dataset and what problem are you trying to solve? I just want to understand if this is related to my problem or not. I guess my dataset is too small or my model is too complex.
Large, ~90k, and a simple problem (really one which I could solve without a neural network), just wanted to test the uncertainties that come out of this. My network is fully-connected as above, if yours is convolutional, I guess that's a pretty big difference.
I tried the solution proposed by @mcourteaux but I stumbled in a different error. Here is my code
import six
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.utils.tf_utils import is_tensor_or_variable
import tensorflow as tf
import tensorflow_probability as tfp
# https://github.com/tensorflow/probability/issues/742
class MeanMetricWrapper(tf.keras.metrics.Mean):
def __init__(self, fn, name=None, dtype=None, **kwargs):
super(MeanMetricWrapper, self).__init__(name=name, dtype=dtype)
self._fn = fn
self._fn_kwargs = kwargs
def update_state(self, y_true, y_pred, sample_weight=None):
matches = self._fn(y_true, y_pred, **self._fn_kwargs)
return super(MeanMetricWrapper, self).update_state(
matches, sample_weight=sample_weight)
def get_config(self):
config = {}
for k, v in six.iteritems(self._fn_kwargs):
config[k] = K.eval(v) if is_tensor_or_variable(v) else v
base_config = super(MeanMetricWrapper, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
negloglik = lambda p_y, y: -p_y.log_prob(y)
negloglik_w = MeanMetricWrapper(negloglik)
class BetaBinomial(tf.keras.layers.Layer):
def __init__(self):
super(BetaBinomial, self).__init__()
def build(self, input_shape):
self.alpha = self.add_weight(shape=(), trainable=True, initializer=tf.keras.initializers.Ones())
self.beta = self.add_weight(shape=(), trainable=True, initializer=tf.keras.initializers.Ones())
self.posterion = tfp.layers.DistributionLambda(lambda clicks: tfp.distributions.DirichletMultinomial(
tf.cast(clicks, tf.float32), [self.alpha, self.beta], validate_args=False, allow_nan_stats=True,
name='DirichletMultinomial'))
def call(self, input):
return self.posterion(input)
clicks = tf.keras.layers.Input(name='clicks', shape=(), dtype=tf.int64)
posterior = BetaBinomial()(clicks)
m = tf.keras.Model(inputs=[clicks], outputs=posterior)
m.summary()
m.compile(loss=negloglik_w, optimizer='adam')
yet I get the error
WARNING:tensorflow:AutoGraph could not transform <bound method BetaBinomial.call of <__main__.BetaBinomial object at 0x83e6a8ed0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Unable to locate the source code of <bound method BetaBinomial.call of <__main__.BetaBinomial object at 0x83e6a8ed0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2020-04-16 12:23:51.199497: W tensorflow/python/util/util.cc:319] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
WARNING:tensorflow:
The following Variables were used a Lambda layer's call (distribution_lambda), but
are not present in its tracked objects:
<tf.Variable 'beta_binomial/Variable:0' shape=() dtype=float32>
<tf.Variable 'beta_binomial/Variable:0' shape=() dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
clicks (InputLayer) [(None,)] 0
_________________________________________________________________
beta_binomial (BetaBinomial) (None, 2) 2
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________
Traceback (most recent call last):
File "<input>", line 47, in <module>
File "/Users/cdalmaso/opt/anaconda3/envs/tfp/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "/Users/cdalmaso/opt/anaconda3/envs/tfp/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 446, in compile
self._compile_weights_loss_and_weighted_metrics()
File "/Users/cdalmaso/opt/anaconda3/envs/tfp/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "/Users/cdalmaso/opt/anaconda3/envs/tfp/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1592, in _compile_weights_loss_and_weighted_metrics
self.total_loss = self._prepare_total_loss(masks)
File "/Users/cdalmaso/opt/anaconda3/envs/tfp/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1652, in _prepare_total_loss
per_sample_losses = loss_fn.call(y_true, y_pred)
TypeError: call() takes 2 positional arguments but 3 were given
this is running on tensorflow==2.1.0
and tensorflow-probability==0.9.0
I am also confused: is the wrapper intended for the loss or for the metrics?
Thanks in advance
@mcourteaux code does not work with tf2, since there is no .eval and is_tensor_or_variable functions, any other workaround? And when is this going to be fixed?
@Strateus I used the solution described in the comment https://github.com/tensorflow/probability/issues/742#issuecomment-580433644 and it worked for me with TF 2. You need to import that function from from tensorflow.python.keras.utils.tf_utils import is_tensor_or_variable
and do tf.keras.backend.eval
rather than just eval
(or, equivalently, import eval
from the Keras' backend). Here's the full solution.
import six
import tensorflow as tf
from tensorflow.python.keras.utils.tf_utils import is_tensor_or_variable
class MetricWrapper(tf.keras.metrics.Mean):
def __init__(self, fn, name="my_metric", dtype=None, **kwargs):
super(MetricWrapper, self).__init__(name=name, dtype=dtype)
self._fn = fn
self._fn_kwargs = kwargs
def update_state(self, y_true, y_pred, sample_weight=None):
matches = self._fn(y_true, y_pred, **self._fn_kwargs)
return super(MetricWrapper, self).update_state(matches, sample_weight=sample_weight)
def get_config(self):
config = {}
for k, v in six.iteritems(self._fn_kwargs):
config[k] = tf.keras.backend.eval(v) if is_tensor_or_variable(v) else v
base_config = super(MetricWrapper, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@nbro thanks, i used a different workaround: MultivariateNormalTriL instead of DenseFlipout.
Thank you for your hints, tried it too, but still fails:
171 loss={'loss_1': negloglik, 'loss_2': MetricWrapper(negloglik, name='nll')},
--> 172 loss_weights={'loss_1': 1., 'loss_2': 1.}
173 )
tensorflow_core\python\training\tracking\base.py in _method_wrapper(self, *args, **kwargs)
455 self._self_setattr_tracking = False # pylint: disable=protected-access
456 try:
--> 457 result = method(self, *args, **kwargs)
458 finally:
459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access
tensorflow_core\python\keras\engine\training.py in compile(self, optimizer, loss, metrics, loss_weights, sample_weight_mode, weighted_metrics, target_tensors, distribute, **kwargs)
444
445 # Creates the model loss and weighted metrics sub-graphs.
--> 446 self._compile_weights_loss_and_weighted_metrics()
447
448 # Functions for train, test and predict will
tensorflow_core\python\training\tracking\base.py in _method_wrapper(self, *args, **kwargs)
455 self._self_setattr_tracking = False # pylint: disable=protected-access
456 try:
--> 457 result = method(self, *args, **kwargs)
458 finally:
459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access
tensorflow_core\python\keras\engine\training.py in _compile_weights_loss_and_weighted_metrics(self, sample_weights)
1590 # loss_weight_2 * output_2_loss_fn(...) +
1591 # layer losses.
-> 1592 self.total_loss = self._prepare_total_loss(masks)
1593
1594 def _prepare_skip_target_masks(self):
tensorflow_core\python\keras\engine\training.py in _prepare_total_loss(self, masks)
1650
1651 if hasattr(loss_fn, 'reduction'):
-> 1652 per_sample_losses = loss_fn.call(y_true, y_pred)
1653 weighted_losses = losses_utils.compute_weighted_loss(
1654 per_sample_losses,
TypeError: call() takes 2 positional arguments but 3 were given
loss={'loss_1': negloglik, 'loss_2': MetricWrapper(negloglik, name='nll')}
To be honest, I didn't fully read your traceback, but this line seems to suggest that you're not using MetricWrapper
for the first loss. Maybe do the following
loss={'loss_1': MetricWrapper(negloglik, name='nll1'), 'loss_2': MetricWrapper(negloglik, name='nll2')}
Nope, my first loss does not need it, it works without this wrapper.
@Strateus But you're using negloglik
in both cases (i.e. the same loss), i.e. you're passing negloglik
to MetricWrapper
and using negloglik
directly.
it is the same function, used for 2 different outputs, yes. And it works fine with MultivariateNormalTril, but does not work with DenseFlipout.
@Strateus But I am suggesting that you also use MetricWrapper
for loss_1
too, to avoid the error you describe above, otherwise, why would you need MetricWrapper
in the first place if you can use negloglik
directly?
i cannot use negloglik directly with DenseFlipout, because it does not have log_prob somehow. So i either need to replace DenseFlipout with MultivariateNormalTril, or use this wrapper (as i thought, but it does not work)
@Strateus My question is: why don't you use MetricWrapper
for loss_1
too? That's what I've not yet understood.
why would i need to do this, if it works ok directly? Occam's razor, i don't need a wrapper if i don't need it.
@Strateus That's why I asked another question above: why do you use it for loss_2
if it works directly then? I think there's a big misunderstanding here.
It works directly only with loss_1, because it is a "MultivariateNormalTril" layer. It does not work directly with loss_2, which is a "DenseFlipout layer".
to rephrase: DenseFlipout has some bug, which was not yet fixed, that is why i am here asking this question. If there would not be a bug in TFP, i would not need to do this wrapper stuff, which is obviously a workaround.
@Strateus Ha, so you have like a fork of layers in your model. But have you tried to wrap both losses with MetricWrapper
and use in both cases dense layers?
The bug is not in the dense layer, I think. The problem is that Keras/TensorFlow was not programmed to deal with models that return a distribution.
@nbro @Strateus I used the same MetricWrapper that u shared for tf==2.0 and tfp==0.9.0. The model architecture is:
prior = tfd.Independent(tfd.Normal(loc=tf.zeros(time_steps_output, dtype=tf.float32), scale=1.0), reinterpreted_batch_ndims=1)
model = keras.Sequential()
model.add(keras.layers.Input(shape=X_shape[-2:]))
model.add(keras.layers.GRU(256, activation="relu", return_sequences=True))
model.add(keras.layers.GRU(128, activation="relu", return_sequences=True))
model.add(keras.layers.GRU(128, activation="relu", return_sequences=True))
model.add(keras.layers.GRU(64, activation="relu", recurrent_dropout=0.4))
model.add(keras.layers.Dense(16, activation="relu"))
model.add(keras.layers.Dense(tfp.layers.MultivariateNormalTriL.params_size(time_steps_output),
activation=None, name="distribution_weights"))
model.add(tfp.layers.MultivariateNormalTriL(time_steps_output,
activity_regularizer=tfp.layers.KLDivergenceRegularizer
(prior, weight=1 / n_batches), name="output"))
This is my compile and neg_log_likelihood():
def neg_log_likelihood(y_true, y_pred): return -tf.reduce_mean(y_pred.log_prob(tf.cast(tf.argmax(y_true, axis=-1), tf.int32)))
def fit_model(model, data_train, data_val, n_epochs):
parallel_model = multi_gpu_model(model, gpus=2)
# Define Early stopper
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=3)
parallel_model.compile(
loss=MetricWrapper(neg_log_likelihood),
optimizer=keras.optimizers.Adam(0.000001)
# metrics=[MetricWrapper(neg_log_likelihood)]
)
history = parallel_model.fit(
data_train,
epochs=n_epochs,
validation_data=data_val,
verbose=0,
shuffle=False,
callbacks=[es]
)
Then also I got this error:
Traceback (most recent call last):
File "event_mapping_model_training.py", line 318, in
@ShikhaSingh10 I don't know if this is the problem, but I am using the functional API (rather than the sequential one). I am also using DistributionLambda
, to which I pass the output distribution.
@nbro This is the enviornment details:
tensorboard 2.2.2 pypi_0 pypi tensorboard-plugin-wit 1.7.0 pypi_0 pypi tensorflow 2.2.0 pypi_0 pypi tensorflow-estimator 2.2.0 pypi_0 pypi tensorflow-probability 0.10.0 pypi_0 pypi
In the same design I am facing this issue now:
AttributeError Traceback (most recent call last)
I think i found the solution without using MetricWrapper. I converted the tensor into distribution before calculating neg_log_likelihood.
In the following code, I converted y_pred tensor into MultivariateNormalTril distribution which is the expected output distribution
def neg_log_likelihood(y_true, y_pred):
y_pred = tfp.distributions.MultivariateNormalTriL(y_pred)
return -tf.reduce_mean(y_pred.log_prob(y_true))
I'm experiencing this problem actually after training, at model.save()
time, with tf 2.3 and tfp 0.11
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorflow==2.3.1
tensorflow-estimator==2.3.0
tensorflow-probability==0.11.1
Minimal example:
import tensorflow.keras as tfk
import tensorflow_probability as tfp
import tensorflow as tf
input_dim = 2
latent_dim = 1
prior = tfp.distributions.MultivariateNormalDiag(loc=tf.zeros(latent_dim))
encoder = tfk.Sequential([
tfk.layers.InputLayer(input_shape=[input_dim]),
tfk.layers.Dense(units=tfp.layers.MultivariateNormalTriL.params_size(latent_dim)),
tfp.layers.MultivariateNormalTriL(
event_size=latent_dim,
activity_regularizer=tfp.layers.KLDivergenceRegularizer(prior),
)
])
decoder = tfk.Sequential([
tfk.layers.InputLayer(input_shape=[latent_dim]),
tfk.layers.Dense(units=tfp.layers.MultivariateNormalTriL.params_size(input_dim)),
tfp.layers.MultivariateNormalTriL(event_size=input_dim)
])
VAE = tfk.Model(inputs=encoder.inputs, outputs=decoder(encoder.outputs[0]))
def loss(x, x_rv):
return -tf.reduce_sum(x_rv.log_prob(x))
VAE.compile(loss=loss)
data = tf.random.uniform(shape=(500, input_dim))
VAE.fit(x=data, y=data, epochs=2)
print("training done")
VAE.save(filepath="mymodel") # Failure
Specifically, it seems the KLDivergenceRegularizer
is being passed a Tensor instead of a distribution and that is triggering the error. Even after wrapping my loss function with MeanMetricWrapper
I get the same AttributeError :\
Am I missing some way to make the workaround work? Thanks in advance!
I managed to get save
to work, making the following modifications to the file tensorflow_probability/python/layers/distribution_layer.py
. I would really appreciate if this bug is fixed, as storing and loading modules is essential for me.
_make_kl_divergence_fn
function, in order to avoid computing the divergence when the input is a Tensor
object, like this: with tf.name_scope('kldivergence_loss'):
if isinstance(distribution_a, tf.Tensor):
return 0.0
...
KLDivergenceRegularizer
for a KLDivergenceAddLoss
layer, or add a get_config
method to the KLDivergenceRegularizer
object: def get_config(self):
config = {'use_exact_kl': self._use_exact_kl,
'test_points_reduce_axis': self._test_points_reduce_axis,
'weight': self._weight}
return dict(list(config.items()))
Sadly, I found another bug when loading the model, even if I can save it. I was going to prepare a colab for that, but I don't know how to easily change already installed tensorflow probability code in the cloud machine. In the meantime, here it is a short example, where I was trying different commented lines to fix the problem, but it always fails.
This issue continues in tensorflow probability version 0.12.1, I have to update the code as mentioned here to make it work.
This bug has to be fixed. Not being able to save TF probability models is a major issue I think! And the fix seems easy enough, as per @cserpell snippet.
Still an issue today using tfp 0.15.0 and TF 2.7.0
Same problem using tfp 0.16.0 and tf 2.9.1
Still an issue today using tfp 0.17.0 and tf 2.9.2
Still an issue using tfp 0.18.0
Change the model output - add the distribution params alongside the original predictions. Then in the custom loss fn, duplicate exactly the same tfd distribution, supply it with the dist. params and restore the log_prob part. Finally you should trim off the unneeded params from model.predict's output to get usual predictions.
class TPNormal_layer(tf.keras.layers.Layer):
def __init__(self, n_outputs):
super(TPNormal_layer, self).__init__()
kernel_divergence_fn = lambda q, p, _: tfp.distributions.kl_divergence(q, p)
bias_divergence_fn = kernel_divergence_fn
self.parameters = DenseFlipout(n_outputs*3, kernel_divergence_fn=kernel_divergence_fn,
bias_divergence_fn=bias_divergence_fn)
make_distribution_fn = lambda t: tfd.TwoPieceNormal(loc=t[..., :n_outputs],
scale=1e-3 + tf.math.softplus(0.05 * t[...,n_outputs:n_outputs*2]),
skewness= tf.math.softplus(0.05 * t[...,n_outputs*2:n_outputs*3]))
convert_to_tensor_fn = make_distribution_fn
self.samples = DistributionLambda(make_distribution_fn)
def call(self, inputs):
x = self.parameters(inputs)
output = self.samples(x)
return concatenate([output, x])
outputs = TPNormal_layer(n_outputs=3)(all_inputs)
model = tf.keras.Model(inputs=all_inputs, outputs=outputs)
def MyLoss():
def loss(y_true, y_pred):
n_outputs = 3
yparams = y_pred[..., n_outputs:]
distfn = lambda t: tfd.TwoPieceNormal(loc=t[..., :n_outputs],
scale=1e-3 + tf.math.softplus(0.05 * t[...,n_outputs:n_outputs*2]),
skewness= tf.math.softplus(0.05 * t[...,n_outputs*2:n_outputs*3]))
ypdist = distfn(yparams)
loss_new = -tf.reduce_mean(ypdist.log_prob(y_true))
return loss_new
return loss
model.compile(loss=MyLoss(), optimizer=adam)
The problem still occurs with tensorflow=2.15
and tensorflow-probability=0.23
. None of the approaches above works for me. Exporting for inference works, but the training signature throws with AttributeError: 'SymbolicTensor' object has no attribute 'log_prob'
The following code
produces the error
with TF 2.1 and TFP 0.9.
This error seems to be due to the fact that
y_pred
is a tensor when the loss is called, while it should be a distribution. Meanwhile, I found a question on Stack Overflow related to the third issue I mentioned above.