The categorical_cross_entropy in the Bayesian loss function is wrong.

bzhong2 commented 5 years ago

The function definition is: tf.keras.backend.categorical_crossentropy( target, output, from_logits=False, axis=-1 )

But the code on line 68 is: undistorted_loss = K.categorical_crossentropy(pred, true, from_logits=True)

sazya commented 5 years ago

It might not be the same issue by bzhong2, but...

In line 59 on bin/train.py, loss seems to be added logits_variance and softmax_output.

model.compile(
    optimizer=Adam(lr=1e-3, decay=0.001),
    loss={
    'logits_variance': bayesian_categorical_crossentropy(FLAGS.monte_carlo_simulations, num_classes),
    'softmax_output': 'categorical_crossentropy'
    },
    metrics={'softmax_output': metrics.categorical_accuracy},
    loss_weights={'logits_variance': .2, 'softmax_output': 1.})

But line 78 on bnn/loss_equations.py is

return variance_loss + undistorted_loss + variance_depressor

bayesian_categorical crossentropy includes undistorted_loss, which is the same as softmax_output. Is this double count?

And it's relate bzhong2's issue? If we rewrite undistorted_loss as

undistorted_loss = K.categorical_crossentropy(pred, true, from_logits=True)
 -> undistorted_loss = K.categorical_crossentropy(true, pred, from_logits=True)

can we use only the logits_valiance in loss ?

SivagopinathreddyVinta commented 5 years ago

I am also having same issue

The function definition is: tf.keras.backend.categorical_crossentropy( target, output, from_logits=False, axis=-1 )

But the code on line 68 is: undistorted_loss = K.categorical_crossentropy(pred, true, from_logits=True)

GKalliatakis commented 4 years ago

Hi @sazya have you found out if the bayesian_categorical_crossentropy makes a double count of the softmax_output?

Regarding the from_logits being set to True (taken from Keras doc) --> from_logits: Boolean, whether 'output' is the result of a softmax, or is a tensor of logits.

kyle-dorman / bayesian-neural-network-blogpost

The categorical_cross_entropy in the Bayesian loss function is wrong. #6