tensorflow / recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
Apache License 2.0
1.84k stars 275 forks source link

tfrs.models.Model "fit" and "evaluate" Return Loss Values for Last Batch Only #585

Open pof-declaneaston opened 1 year ago

pof-declaneaston commented 1 year ago

When I call "fit" or "evaluate" on a tfrs.model.Model the loss values (total_loss, loss, and regularization_loss) returned are only based on just the last batch (or last batch of each epoch for "fit"). I believe this is because of the way train_step and test_step are implemented.

  def train_step(self, inputs):
    """Custom train step using the `compute_loss` method."""

    with tf.GradientTape() as tape:
      loss = self.compute_loss(inputs, training=True)

      # Handle regularization losses as well.
      regularization_loss = sum(self.losses)

      total_loss = loss + regularization_loss

    gradients = tape.gradient(total_loss, self.trainable_variables)
    self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))

    metrics = {metric.name: metric.result() for metric in self.metrics}
    metrics["loss"] = loss
    metrics["regularization_loss"] = regularization_loss
    metrics["total_loss"] = total_loss

    return metrics

  def test_step(self, inputs):
    """Custom test step using the `compute_loss` method."""

    loss = self.compute_loss(inputs, training=False)

    # Handle regularization losses as well.
    regularization_loss = sum(self.losses)

    total_loss = loss + regularization_loss

    metrics = {metric.name: metric.result() for metric in self.metrics}
    metrics["loss"] = loss
    metrics["regularization_loss"] = regularization_loss
    metrics["total_loss"] = total_loss

    return metrics

From what I understand the "metrics" values returned from these functions are meant to be metrics for the entire dataset (or epoch) up to and including this batch. The values here are just the last batch. The following code reproduces the issue.

import tensorflow_recommenders as tfrs
import tensorflow as tf

class Model(tfrs.models.Model):
    def __init__(self):
        super().__init__()

    @tf.function
    def compute_loss(self, inputs, training=False) -> tf.Tensor:
        return inputs[0]

dataset = tf.data.Dataset.from_tensor_slices(([1, 1, 1, 10], [0, 0, 0, 0]))

model = Model()
model.compile()

h = model.fit(dataset, epochs=5)
print(h.__dict__)

result = model.evaluate(dataset, return_dict=True)
print(result)

I get the following as output

Epoch 1/5
WARNING:tensorflow:The dtype of the target tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:The dtype of the target tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
4/4 [==============================] - 0s 1ms/step - loss: 4.6000 - regularization_loss: 0.0000e+00 - total_loss: 4.6000
Epoch 2/5
4/4 [==============================] - 0s 1ms/step - loss: 4.6000 - regularization_loss: 0.0000e+00 - total_loss: 4.6000
Epoch 3/5
4/4 [==============================] - 0s 1ms/step - loss: 4.6000 - regularization_loss: 0.0000e+00 - total_loss: 4.6000
Epoch 4/5
4/4 [==============================] - 0s 1ms/step - loss: 4.6000 - regularization_loss: 0.0000e+00 - total_loss: 4.6000
Epoch 5/5
4/4 [==============================] - 0s 1ms/step - loss: 4.6000 - regularization_loss: 0.0000e+00 - total_loss: 4.6000
{'validation_data': None, 'model': <__main__.MeetMeModel object at 0x16212e7f0>, '_chief_worker_only': None, '_supports_tf_logs': False, 'history': {'loss': [10, 10, 10, 10, 10], 'regularization_loss': [0, 0, 0, 0, 0], 'total_loss': [10, 10, 10, 10, 10]}, 'params': {'verbose': 1, 'epochs': 5, 'steps': 4}, 'epoch': [0, 1, 2, 3, 4]}
4/4 [==============================] - 0s 1ms/step - loss: 4.6000 - regularization_loss: 0.0000e+00 - total_loss: 4.6000
{'loss': 10, 'regularization_loss': 0, 'total_loss': 10}
schistyakov commented 1 year ago

Yes, I agree, there should be created something like self.loss_tracker = keras.metrics.Mean(name="loss") for each additional loss in __init__ of the Model and in train_step and test_step they should be updated by self.loss_tracker.update_state(loss)