tensorflow / recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
Apache License 2.0
1.82k stars 273 forks source link

Unable to save model checkpoint on Tensorflow Recommender's subclass model #669

Open karndeepsingh opened 1 year ago

karndeepsingh commented 1 year ago

Hi, I have used Tensorflow Recommender's tutorial of Movielens to train the recommender model. But when I try to use callback mdel checkpoint in model.fit(). It raises following error:

the input shape is not available or because the forward pass of the model is not defined.To define a forward pass, please override `Model.call()`. To specify an input shape, either call `build(input_shape)` directly, or call the model on actual data using `Model()`, `Model.fit()`, or `Model.predict()`. If you have a custom training step, please make sure to invoke the forward pass in train step through `Model.__call__`, i.e. `model(inputs)`, as opposed to `model.call()`.
Screenshot 2023-05-27 at 6 23 19 PM

Also, I am unable to model.summary() Please help me to resolve the issue. What are the changes I need to do in the subclass models to save best model checkpoint?

Here is the tutorial link that I am using for training: https://www.tensorflow.org/recommenders/examples/deep_recommenders

alexstrid commented 1 year ago

Have you solved this?

Tmoradi commented 11 months ago

^

patrickorlando commented 11 months ago

You can define a forward pass and use it in the compute_loss function.

This should work, but there_ is a caveat. If your model architecture has layers that are shared between the query and candidate towers, (for example an item_id embedding in a sequential recommendation model), then I the checkpoint will save these as separate variables. This may cause an issue when restoring from the checkpoint.

class MovielensModel(tfrs.models.Model):

  def __init__(self, layer_sizes):
    super().__init__()
    self.query_model = QueryModel(layer_sizes)
    self.candidate_model = CandidateModel(layer_sizes)
    self.task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=movies.batch(128).map(self.candidate_model),
        ),
    )

  def call(self, features, training=False):
    query_embeddings = self.query_model({
        "user_id": features["user_id"],
        "timestamp": features["timestamp"],
    })
    movie_embeddings = self.candidate_model(features["movie_title"])
    return (query_embeddings, movie_embeddings)

  def compute_loss(self, features, training=False):
    # We only pass the user id and timestamp features into the query model. This
    # is to ensure that the training inputs would have the same keys as the
    # query inputs. Otherwise the discrepancy in input structure would cause an
    # error when loading the query model after saving it.

    query_embeddings, movie_embeddings = self(features, training=training)

    return self.task(
        query_embeddings, movie_embeddings, compute_metrics=not training)