kyle-dorman / bayesian-neural-network-blogpost

Building a Bayesian deep learning classifier
https://medium.com/towards-data-science/building-a-bayesian-deep-learning-classifier-ece1845bc09
481 stars 105 forks source link

Negative loss & logits_variance_loss #12

Open GKalliatakis opened 4 years ago

GKalliatakis commented 4 years ago

Hi, I have created a Bayesian CNN classifier as described in this repo, but my model's loss is always negative as well as the logits_variance_loss (see screenshot below). Any idea why is that happening?

Screenshot from 2020-01-20 14-49-19

pranavpandey2511 commented 4 years ago

@GKalliatakis Hi, can you please share the code for the loss function you wrote along with the training loop code.

GKalliatakis commented 4 years ago

The loss function is exactly the one described in this repo:


# Bayesian categorical cross entropy.
# N data points, C classes, T monte carlo simulations
# true - true values. Shape: (N, C)
# pred_var - predicted logit values and variance. Shape: (N, C + 1)
# returns - loss (N,)
def bayesian_categorical_crossentropy(T, num_classes):
  def bayesian_categorical_crossentropy_internal(true, pred_var):
    # shape: (N,)
    std = K.sqrt(pred_var[:, num_classes:])
    # shape: (N,)
    variance = pred_var[:, num_classes]
    variance_depressor = K.exp(variance) - K.ones_like(variance)
    # shape: (N, C)
    pred = pred_var[:, 0:num_classes]
    # shape: (N,)
    undistorted_loss = K.categorical_crossentropy(pred, true, from_logits=True)
    # shape: (T,)
    iterable = K.variable(np.ones(T))
    dist = distributions.Normal(loc=K.zeros_like(std), scale=std)
    monte_carlo_results = K.map_fn(gaussian_categorical_crossentropy(true, pred, dist, undistorted_loss, num_classes), iterable, name='monte_carlo_results')

    variance_loss = K.mean(monte_carlo_results, axis=0) * undistorted_loss

    return variance_loss + undistorted_loss + variance_depressor

  return bayesian_categorical_crossentropy_internal

# for a single monte carlo simulation, 
#   calculate categorical_crossentropy of 
#   predicted logit values plus gaussian 
#   noise vs true values.
# true - true values. Shape: (N, C)
# pred - predicted logit values. Shape: (N, C)
# dist - normal distribution to sample from. Shape: (N, C)
# undistorted_loss - the crossentropy loss without variance distortion. Shape: (N,)
# num_classes - the number of classes. C
# returns - total differences for all classes (N,)
def gaussian_categorical_crossentropy(true, pred, dist, undistorted_loss, num_classes):
  def map_fn(i):
    std_samples = K.transpose(dist.sample(num_classes))
    distorted_loss = K.categorical_crossentropy(pred + std_samples, true, from_logits=True)
    diff = undistorted_loss - distorted_loss
    return -K.elu(diff)
  return map_fn

Then the model was compiled with the following settings (again as described in this repo):

        # Compile the model using two losses, one is the aleatoric uncertainty loss function
        # and the other is the standard categorical cross entropy function.
        self.model.compile(
            optimizer=Adam(lr=1e-3, decay=0.001),
            # optimizer=SGD(lr=1e-5, momentum=0.9),
            loss={'logits_variance': bayesian_categorical_crossentropy(self.monte_carlo_simulations, self.classes),  # aleatoric uncertainty loss function
                  'softmax_output': 'categorical_crossentropy'  # standard categorical cross entropy function
                  # 'softmax_output': standard_categorical_cross_entropy  # standard categorical cross entropy function
                  },
            metrics={'softmax_output': metrics.categorical_accuracy},
            # the aleatoric uncertainty loss function is weighted less than the categorical cross entropy loss
            # because the aleatoric uncertainty loss includes the categorical cross entropy loss as one of its terms.
            loss_weights={'logits_variance': .2, 'softmax_output': 1.}
        )

The only thing I am concerned about and is different from the implementation described here is the way raw images are fed in during training, because we have to do with a multi-output model and the author of this repo is dealing with a smaller dataset which allows him to do model.fit In my case I have created a custom generator:

def multiple_outputs(generator, image_dir, batch_size, image_size, subset):
    gen = generator.flow_from_directory(
        image_dir,
        target_size=(image_size, image_size),
        batch_size=batch_size,
        class_mode='categorical',
        subset=subset)

    while True:
        gnext = gen.next()
        # return image batch and 3 sets of lables
        yield gnext[0], [gnext[1], gnext[1]]

which is used as follows:

datagen = ImageDataGenerator(rescale=1. / 255, validation_split=0.20)
custom_train_generator = multiple_outputs(generator = datagen,
                                              image_dir = base_dir,
                                              batch_size = train_batch_size,
                                              image_size = img_width,
                                              subset = 'training')

and then the custom generator is passed during fit:

history = self.model.fit_generator(custom_train_generator,
                                           epochs=nb_of_epochs,
                                           steps_per_epoch=steps_per_epoch,
                                           validation_data=validation_data,
                                           validation_steps=validation_steps,
                                           callbacks=callbacks_list)

Any thoughts?

kenrickfernandes commented 3 years ago

Hello @GKalliatakis, were you able to solve this issue?

sborquez commented 3 years ago

Hello, I think the order of the arguments of the K.categorical_crossentropy calls is wrong 🤔. In the Keras documentation, the arguments appear with y_true as the first argument and y_pred as the second.

undistorted_loss = K.categorical_crossentropy(true, pred, from_logits=True)

distorted_loss = K.categorical_crossentropy(true, pred + std_samples, from_logits=True)

Should the pred and true arguments be swapped?