Open tansaku opened 3 years ago
the strange thing is that if I reduce the amount of movielens data being used (take the first 100 mappings), I still get sensible output from the model, e.g.
gives the following
ranking, titles = index(np.array(["1"]))
print(f"Top 10 recommendations for user {user_ids_vocabulary.get_vocabulary()[1]}: {titles}")
print(f"thing: {ranking}")
ranking, titles = index(np.array(["2"]))
print(f"Top 10 recommendations for user {user_ids_vocabulary.get_vocabulary()[2]}: {titles}")
print(f"thing: {ranking}")
Top 10 recommendations for user 699: [[b'Harold and Maude (1971)' b'Rock, The (1996)'
b'Mulholland Falls (1996)' b'Four Weddings and a Funeral (1994)'
b'Aladdin (1992)' b'Sense and Sensibility (1995)' b'Local Hero (1983)'
b'Jungle2Jungle (1997)' b"Antonia's Line (1995)"
b'Man Without a Face, The (1993)']]
thing: [[0.16602841 0.14439449 0.13890035 0.13128598 0.11802915 0.11523528
0.11077308 0.10783543 0.09914517 0.09864655]]
Top 10 recommendations for user 663: [[b'Harold and Maude (1971)' b'Rock, The (1996)'
b'Mulholland Falls (1996)' b'Four Weddings and a Funeral (1994)'
b'Aladdin (1992)' b'Sense and Sensibility (1995)' b'Local Hero (1983)'
b'Jungle2Jungle (1997)' b"Antonia's Line (1995)"
b'Man Without a Face, The (1993)']]
thing: [[0.16602841 0.14439449 0.13890035 0.13128598 0.11802915 0.11523528
0.11077308 0.10783543 0.09914517 0.09864655]]
implying that the number of mappings is not an issue - so could it just be the number of possible values of user id and movies - i.e. we've got over a 1000 movies, but the <100 interests in my data is the problem?
But even if I reduce the number of movies to 100 I still get great recommendations (although they are the same for each user) - so either I've got some silly broken thing in my approach, or my reduction in movielens data is not actually coming through due to some caching mechanism ...
I was facing a similar issue. My dataset:
then after training i had:
Recommendations for user 42: [b'clara callan' b'clara callan' b'clara callan']
That was due to my books dataset respectively movies dataset in the example.
ratings = ratings.map(lambda x: {
"movie_title": x["movie_title"],
"user_id": x["user_id"],
})
movies = movies.map(lambda x: x["movie_title"])
Movies is supposed to store unique representation of each movie as it is used in FactorizedTopK.
In my case
books = books.map(lambda x: x["book_title"])
books were not unique.
After preprocessing my model gives me:
[b'clara callan', b'flu: the story of the great influenza pandemic of 1918 and the search for the virus that caused it', b"the kitchen god's wife"]
Hope it can help!
wow thanks @GaetanDu that makes a lot of sense - was there a simple map operation to ensure uniqueness - converting to a set and then back to a list again maybe?
I did just have a look at my interests, and printing them out, they are all unique ... as are the user ids ... ah, wait, but that's after they have already been processed as part of the vocabulary adaptation step ...
My data is a pandas dataframe i didn't use tensorflow to make unique representation, here is how i preprocess:
unique_book_titles_df = pd.DataFrame(overall_data.book_title.unique(), columns=['book_title'])
books = {key: col.values for key, col in dict(overall_data[['book_title']]).items()}
books = tf.data.Dataset.from_tensor_slices(books)
you find books/movies in FactorizedtopK and index method from bruteforce.
thanks @GaetanDu - really appreciate you sharing that
I just tried
unique_interests = set()
for i in interests.take(25000):
unique_interests.add(i.numpy())
print(unique_interests)
unique_users = set()
for i in users.take(25000):
unique_users.add(i.numpy())
users = tf.data.Dataset.from_tensor_slices(list(unique_users))
user_ids_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(users)
interests = tf.data.Dataset.from_tensor_slices(list(unique_interests))
interests_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
interests_vocabulary.adapt(interests)
and this runs, and I get a variety of output for an individual recommendation, but each recommendations for each different user is the same as the others ...
getting
[[b'Gaming' b'Animation' b'Tech' b'Design' b'Bridal' b'Couture' b'Dance' b'Personal Fitness' b'Musical' b'Gardening']]
is better than
[[b'Comedy' b'Comedy' b'Comedy' b'Comedy' b'Comedy' b'Comedy' b'Comedy' b'Comedy' b'Comedy' b'Comedy']]
but I would still expect each user to get some variation in their set of recommendations
do you get different recommendations for different users ...?
Yes i have different recommendations and what are you giving to:
tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=books.batch(128).map(self.book_model)
)
)
and
# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
# recommends movies out of the entire movies dataset.
index.index(books.batch(1000).map(model.book_model), books)
As you can see books is unique because i created it from dataframe.unique().
My entire model:
dataset = {key: col.values for key, col in dict(overall_data[['user_id', 'book_title', 'rating']]).items()}
dataset = tf.data.Dataset.from_tensor_slices(dataset).prefetch(tf.data.AUTOTUNE)
unique_book_titles_df = pd.DataFrame(overall_data.book_title.unique(), columns=['book_title'])
books = {key: col.values for key, col in dict(unique_book_titles_df).items()}
books = tf.data.Dataset.from_tensor_slices(books).prefetch(tf.data.AUTOTUNE)
ratings = dataset.map((lambda x: {
"book_title": x["book_title"],
"user_id": x["user_id"],
"user_rating": x["rating"]
}), num_parallel_calls=tf.data.AUTOTUNE)
books = books.map(lambda x: x["book_title"], num_parallel_calls=tf.data.AUTOTUNE)
tf.random.set_seed(42)
shuffled = ratings.shuffle(26562, seed=42, reshuffle_each_iteration=False)
train = ratings.take(24000)
test = ratings.skip(24000).take(2562)
unique_user_ids = overall_data.user_id.unique()
unique_book_titles = overall_data.book_title.unique()
class BookModel(tfrs.models.Model):
def __init__(self, rating_weight: float, retrieval_weight: float) -> None:
# We take the loss weights in the constructor: this allows us to instantiate
# several model objects with different loss weights.
super().__init__()
embedding_dimension = 32
# User and book models.
self.book_model: tf.keras.layers.Layer = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.StringLookup(
vocabulary=unique_book_titles, mask_token=None),
tf.keras.layers.Embedding(len(unique_book_titles) + 1, embedding_dimension)
])
self.user_model: tf.keras.layers.Layer = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.StringLookup(
vocabulary=unique_user_ids, mask_token=None),
tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)
])
# A small model to take in user and book embeddings and predict ratings.
# We can make this as complicated as we want as long as we output a scalar
# as our prediction.
self.rating_model = tf.keras.Sequential([
tf.keras.layers.Dense(256, activation="relu"),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(1),
])
# The tasks.
self.rating_task: tf.keras.layers.Layer = tfrs.tasks.Ranking(
loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.RootMeanSquaredError()],
)
self.retrieval_task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=books.batch(128).map(self.book_model)
)
)
# The loss weights.
self.rating_weight = rating_weight
self.retrieval_weight = retrieval_weight
def call(self, features) -> tf.Tensor:
# We pick out the user features and pass them into the user model.
user_embeddings = self.user_model(features["user_id"])
# And pick out the book features and pass them into the book model.
book_embeddings = self.book_model(features["book_title"])
return (
user_embeddings,
book_embeddings,
# We apply the multi-layered rating model to a concatentation of
# user and book embeddings.
self.rating_model(
tf.concat([user_embeddings, book_embeddings], axis=1)
),
)
def compute_loss(self, features, training=False) -> tf.Tensor:
ratings = features.pop("user_rating")
user_embeddings, book_embeddings, rating_predictions = self(features)
# We compute the loss for each task.
rating_loss = self.rating_task(
labels=ratings,
predictions=rating_predictions,
)
retrieval_loss = self.retrieval_task(user_embeddings, book_embeddings)
# And combine them using the loss weights.
return (self.rating_weight * rating_loss
+ self.retrieval_weight * retrieval_loss)
thanks for sharing all - much appreciated - I'll see if I can replicate
For my task I have:
task = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
interests.batch(128).map(interest_model)
)
)
and for the search I have:
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index(interests.batch(100).map(model.interest_model), interests)
I'll clean up my code and share the full set next week - but mine is basically a copy of the TensorFlow/recommenders/docs/examples/quickstart.ipynb
Yes, it will be easier to debug if you share your code
thanks @GaetanDu - so here's what I'm doing that I just ran in a fresh notebook. I've added a step to ensure the vocabularies are being built from a unique set of elements
from typing import Dict, Text
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs
import pandas as pd
# Make numpy values easier to read.
np.set_printoptions(precision=3, suppress=True)
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
import matplotlib.pyplot as plt
interest_train = pd.read_csv("user-interests.csv",header=0)
interest_train = interest_train.assign(interests_c=interest_train['interest_list'].str.split(';')).explode('interest_list').reset_index(drop=True)
training_dataset = (
tf.data.Dataset.from_tensor_slices(
(
tf.cast(interest_train['interest_list'].values, tf.string),
tf.cast(tf.as_string(interest_train['user_ident'].values), tf.string)
)
)
)
adjusted_training_dataset = training_dataset.map(lambda x,y: {
"interest": x,
"user_id": y
})
interests = training_dataset.map(lambda x,y: x)
users = training_dataset.map(lambda x,y: y)
unique_interests = set()
for i in interests.take(25000):
unique_interests.add(i.numpy())
unique_users = set()
for i in users.take(25000):
unique_users.add(i.numpy())
users = tf.data.Dataset.from_tensor_slices(list(unique_users))
user_ids_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(users)
interests = tf.data.Dataset.from_tensor_slices(list(unique_interests))
interests_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
interests_vocabulary.adapt(interests)
class InterestModel(tfrs.Model):
# We derive from a custom base class to help reduce boilerplate. Under the hood,
# these are still plain Keras Models.
def __init__(
self,
user_model: tf.keras.Model,
interest_model: tf.keras.Model,
task: tfrs.tasks.Retrieval):
super().__init__()
# Set up user and interest representations.
self.user_model = user_model
self.interest_model = interest_model
# Set up a retrieval task.
self.task = task
def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
# Define how the loss is computed.
user_embeddings = self.user_model(features["user_id"])
interest_embeddings = self.interest_model(features["interest"])
return self.task(user_embeddings, interest_embeddings)
# Define user and interest models.
user_model = tf.keras.Sequential([
user_ids_vocabulary,
tf.keras.layers.Embedding(user_ids_vocabulary.vocab_size(), 64)
])
interest_model = tf.keras.Sequential([
interests_vocabulary,
tf.keras.layers.Embedding(interests_vocabulary.vocab_size(), 64)
])
# Define your objectives.
task = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
interests.batch(128).map(interest_model)
)
)
# Create a retrieval model.
model = InterestModel(user_model, interest_model, task)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))
# Train for 3 epochs.
model.fit(adjusted_training_dataset.batch(4096), epochs=3)
# Use brute-force search to set up retrieval using the trained representations.
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index(interests.batch(1000).map(model.interest_model), interests)
so running this:
Epoch 1/3
6/6 [==============================] - 1s 107ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.9972 - factorized_top_k/top_100_categorical_accuracy: 1.0000 - loss: 34003.7478 - regularization_loss: 0.0000e+00 - total_loss: 34003.7478
Epoch 2/3
6/6 [==============================] - 1s 106ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.1905 - factorized_top_k/top_10_categorical_accuracy: 0.3515 - factorized_top_k/top_50_categorical_accuracy: 0.9714 - factorized_top_k/top_100_categorical_accuracy: 1.0000 - loss: 32277.0315 - regularization_loss: 0.0000e+00 - total_loss: 32277.0315
Epoch 3/3
6/6 [==============================] - 1s 109ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0352 - factorized_top_k/top_5_categorical_accuracy: 0.6068 - factorized_top_k/top_10_categorical_accuracy: 0.7307 - factorized_top_k/top_50_categorical_accuracy: 0.9264 - factorized_top_k/top_100_categorical_accuracy: 1.0000 - loss: 31529.4378 - regularization_loss: 0.0000e+00 - total_loss: 31529.4378
I do now get a set of recommendations in a list of 10 where each recommendation is different from the other, e.g. ['food', 'swimming', 'flowers', ...] (which is an improvement over my original where it was like ['food,'food','food',...]) but even with a variety of interests recommended, each individual user gets the same recommendations in the same order ...
I wonder if my data just doesn't have the right distribution to work with this approach ...
Hello, are you sure it is a bad recommendation? I mean i have the same recommendations when i'm trying on new user_id are you ? I suppose this is due to that new people has the same embedding
it seems strange that they should all be identical. Each user has a different combination of interests, so I would have thought that on that basis there should be some variation between users.
My suspicion is that this approach (for this number of epochs, learning rate and network size) relies on there being sufficient numbers of items "rated" by each user. When we reduce to a much smaller movielens dataset where each user has only rated a single movie we get the same behaviour, i.e. same set of recommendations for all users
Hi @tansaku, I am having a similar problem. It happens that even though I am overfitting to my validation and training data, my predictions are the same for all users. I see in your output that you have the same principle. Did you solve it? I will appreciate your opinion. Maybe @maciejkula can give us his thought on this :)
hi @williamberrios thanks for reaching out. I did not solve it yet. At the moment I am trying to visualize the weights via tensorboard, and trying to work out if there is some setting of the hyperparameters or smaller network size that can fix it.
I did get improvements from following @GaetanDu 's suggestions, and interesting that it worked for him for his dataset. I suspect something about the nature of our data distributions is causing this. Can you share anything about the distribution of your data?
I did also reach out to @maciejkula to ask for his input - would be great to better understand what about the data and network architecture allows this to work in some cases and not others
Lots of things could be going wrong here.
FactorizedTopK
layers, you must include each unique candidate once.hi @tansaku , in my case I'm following also the tutorials with my own dataset (basically pretty similar to movies). The pattern, I have seen is that when my network doesn't overfit, it produces different scores and recommendations for all users. However when overfits, in the training and validation sets, It produces equal scores and predictions for all users which I think is counterintuitive because if I'm overfitting it should give me the same predictions as my original dataset. I could be wrong though.
hi @williamberrios I'm not sure what's intuitive or not at this point :-) I know that when I massively reduce the size of the MovieLens data set I get increasingly repetitive output from the recommender. At very low levels every single recommendation is the same, as I increase the data amounts the output becomes more varied. With other things held equal that would correspond to overfitting.
I tried training my data on just one epoch, but still the same problem. How are you measuring degree of overfit?
I'm trying to work out if there is another way to reduce the number of params in the model ...
@maciejkula - thanks so much for the advice - apologies I had some how missed your comment :-)
Taking each of your points in turn:
- When constructing the candidates dataset you pass to the FactorizedTopK layers, you must include each unique candidate once.
print("interests are unique:")
print(len(unique_interests) == len(set(unique_interests)))
print(random.choice(tuple(unique_interests)))
interests are unique:
True
b'Beer'
print("users are unique:")
print(len(unique_users) == len(set(unique_users)))
print(random.choice(tuple(unique_users)))
users are unique:
True
b'200001234567'
My data has precisely 75 interests and 4866 users, each of which are unique. There are 24540 mappings, each in the form:
{'interest': <tf.Tensor: shape=(), dtype=string, numpy=b'Books'>, 'user_id': <tf.Tensor: shape=(), dtype=string, numpy=b'990000123456'>}
- Make sure your vocabularies are working correctly. Are you sure that all of your users are not mapped to the same OOV bucket?
I don't think so. If I print the vocabularies I see this:
print(user_ids_vocabulary.get_vocabulary())
['[UNK]', '99000012345', '99000012346', '99000012347', '99000012348' ...
print(interests_vocabulary.get_vocabulary())
['[UNK]', 'World traveller', 'Winter sports', 'Wine', ...
I assume this indicates that they are not all in some out of vocabulary bucket ... as does this below
data = tf.constant([["990000123456"]])
print(user_ids_vocabulary(data))
tf.Tensor([[1]], shape=(1, 1), dtype=int64)
please do correct me if other aspects need to be checked
- Are there numerical problems? Is the model over-regularized, and all embeddings tend to zero?
I've viewed the embeddings using tensorboard and everything seems nicely spread out. I can't immediately work out a simpler way of viewing a summary of the embeddings. I don't think we have any regularization - you can see the full code earlier in this thread.
user_ids_embedding = tf.keras.layers.Embedding(user_ids_vocabulary.vocab_size(), 16)
tf.print(user_ids_embedding.input_dim)
tf.print(user_ids_embedding.output_dim)
tf.print(user_ids_embedding.embeddings_regularizer)
tf.print(user_ids_embedding.activity_regularizer)
tf.print(user_ids_embedding.embeddings_constraint)
4867
16
None
None
None
interests_embedding = tf.keras.layers.Embedding(interests_vocabulary.vocab_size(), 16)
tf.print(interests_embedding.input_dim)
tf.print(interests_embedding.output_dim)
tf.print(interests_embedding.embeddings_regularizer)
tf.print(interests_embedding.activity_regularizer)
tf.print(interests_embedding.embeddings_constraint)
76
16
None
None
None
I guess our main control on the number of parameters in the model is the dimension on the embedding? I've tried walking these down to as low as 2, but to no effect. I've tried training with single epochs, and with 100 epochs, but the results are always the same - same predicted interests for all users ...
Maybe it's due to some aspect of the distribution of my data, or some other silly mistake in the code ... I feel like I need to run with a very very small sample and print out all the weights/params to better understand what's happening ...
some other relevant data is the model weights, which look okay I think?
[<tensorflow.python.keras.engine.base_layer_utils.TrackableWeightHandler object at 0x7fba012ac6d0>, <tf.Variable 'embedding_9/embeddings:0' shape=(4867, 64) dtype=float32, numpy=
array([[ 0.049, -0.048, 0.036, ..., 0.013, 0.014, 0.009],
[-1.676, -0.38 , -0.834, ..., 1.884, 1.316, 0.621],
[ 0.683, -1.057, 2.111, ..., -0.691, 0.627, 0.105],
...,
[-1.034, -2.202, 0.195, ..., -0.514, -0.506, 0.385],
[ 0.515, 0.14 , -0.024, ..., -0.043, -0.638, -0.865],
[ 0.303, -0.776, -1.835, ..., 0.462, -0.544, -1.108]],
dtype=float32)>, <tensorflow.python.keras.engine.base_layer_utils.TrackableWeightHandler object at 0x7fb990ba5c40>, <tf.Variable 'embedding_10/embeddings:0' shape=(76, 64) dtype=float32, numpy=
array([[ 0.019, -0.027, 0.002, ..., -0.029, -0.031, -0.011],
[ 0.104, 0.024, 0.357, ..., 0.041, -0.37 , 0.307],
[-0.247, -0.03 , 0.221, ..., -0.008, -0.134, -0.028],
...,
[-0.221, 0.01 , 0.404, ..., 0.05 , 0.016, 0.507],
[-0.143, -0.034, 0.2 , ..., 0.196, 0.157, 0.346],
[-0.187, -0.169, 0.347, ..., 0.557, 0.082, -0.141]],
dtype=float32)>, <tf.Variable 'counter:0' shape=() dtype=int32, numpy=75>, <tf.Variable 'total:0' shape=() dtype=float32, numpy=2192.0>, <tf.Variable 'count:0' shape=() dtype=float32, numpy=24540.0>, <tf.Variable 'total:0' shape=() dtype=float32, numpy=10504.0>, <tf.Variable 'count:0' shape=() dtype=float32, numpy=24540.0>, <tf.Variable 'total:0' shape=() dtype=float32, numpy=15402.0>, <tf.Variable 'count:0' shape=() dtype=float32, numpy=24540.0>, <tf.Variable 'total:0' shape=() dtype=float32, numpy=23828.0>, <tf.Variable 'count:0' shape=() dtype=float32, numpy=24540.0>, <tf.Variable 'total:0' shape=() dtype=float32, numpy=24540.0>, <tf.Variable 'count:0' shape=() dtype=float32, numpy=24540.0>]
and then the index weights:
[<tensorflow.python.keras.engine.base_layer_utils.TrackableWeightHandler object at 0x7fba012ac6d0>, <tf.Variable 'embedding_9/embeddings:0' shape=(4867, 64) dtype=float32, numpy=
array([[ 0.049, -0.048, 0.036, ..., 0.013, 0.014, 0.009],
[-1.676, -0.38 , -0.834, ..., 1.884, 1.316, 0.621],
[ 0.683, -1.057, 2.111, ..., -0.691, 0.627, 0.105],
...,
[-1.034, -2.202, 0.195, ..., -0.514, -0.506, 0.385],
[ 0.515, 0.14 , -0.024, ..., -0.043, -0.638, -0.865],
[ 0.303, -0.776, -1.835, ..., 0.462, -0.544, -1.108]],
dtype=float32)>, <tf.Variable 'identifiers:0' shape=(75,) dtype=string, numpy=
array([b'Design', ... b'Gardening'], dtype=object)>, <tf.Variable 'candidates:0' shape=(75, 64) dtype=float32, numpy=
array([[ 0.046, 0.027, 0.257, ..., 0.331, 0.122, 0.241],
[ 0.021, -0.103, 0.511, ..., 0.149, -0.179, 0.392],
[ 0.141, -0.166, 0.565, ..., 0.123, 0.265, 0.401],
...,
[-0.15 , -0.288, 0.189, ..., -0.085, -0.314, 0.24 ],
[ 0.016, 0.164, 0.022, ..., -0.009, -0.047, 0.256],
[-0.216, 0.102, 0.379, ..., 0.081, 0.281, -0.013]],
dtype=float32)>]
it looks like there is sensible variation there - am I just querying the index incorrectly? I guess I should review the underlying code and perhaps run with a very small sample where I can see all the weights ... https://github.com/tensorflow/recommenders/blob/v0.5.2/tensorflow_recommenders/layers/factorized_top_k.py#L428-L553
What are your evaluation metrics? Are they good? Bad?
you mean these?
Epoch 1/3 6/6 [==============================] - 1s 111ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0879 - factorized_top_k/top_5_categorical_accuracy: 0.2595 - factorized_top_k/top_10_categorical_accuracy: 0.3707 - factorized_top_k/top_50_categorical_accuracy: 0.8476 - factorized_top_k/top_100_categorical_accuracy: 1.0000 - loss: 40636.7500 - regularization_loss: 0.0000e+00 - total_loss: 40636.7500 Epoch 2/3 6/6 [==============================] - 1s 109ms/step - factorized_top_k/top_1_categorical_accuracy: 0.1163 - factorized_top_k/top_5_categorical_accuracy: 0.3784 - factorized_top_k/top_10_categorical_accuracy: 0.5273 - factorized_top_k/top_50_categorical_accuracy: 0.9253 - factorized_top_k/top_100_categorical_accuracy: 1.0000 - loss: 32419.3457 - regularization_loss: 0.0000e+00 - total_loss: 32419.3457 Epoch 3/3 6/6 [==============================] - 1s 111ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0893 - factorized_top_k/top_5_categorical_accuracy: 0.4280 - factorized_top_k/top_10_categorical_accuracy: 0.6276 - factorized_top_k/top_50_categorical_accuracy: 0.9710 - factorized_top_k/top_100_categorical_accuracy: 1.0000 - loss: 29677.4032 - regularization_loss: 0.0000e+00 - total_loss: 29677.4032
where we can see the loss dropping from epoch to epoch?
I'm trying to better understand the precise operations. To that end I have made a version with 3 interests and 2 users from 3 mappings. I gave embedding output dimensions of 2, and that allows me to print all the weights in both the model and the index.
These are the mdoel weights:
<tensorflow.python.keras.engine.base_layer_utils.TrackableWeightHandler object at 0x7fe999901f70>
<tf.Variable 'embedding_4/embeddings:0' shape=(3, 2) dtype=float32, numpy=
array([[-0.001, -0.037],
[-0.058, 0.031],
[-0.019, 0.01 ]], dtype=float32)>, <tensorflow.python.keras.engine.base_layer_utils.TrackableWeightHandler object at 0x7fe9f9358a90>
<tf.Variable 'embedding_5/embeddings:0' shape=(4, 2) dtype=float32, numpy=
array([[-0.026, -0.028],
[-0.004, -0.012],
[-0.012, 0.029],
[-0.03 , -0.02 ]], dtype=float32)>
<tf.Variable 'counter:0' shape=() dtype=int32, numpy=3>
<tf.Variable 'total:0' shape=() dtype=float32, numpy=0.0>
<tf.Variable 'count:0' shape=() dtype=float32, numpy=3.0>
<tf.Variable 'total:0' shape=() dtype=float32, numpy=3.0>
<tf.Variable 'count:0' shape=() dtype=float32, numpy=3.0>
<tf.Variable 'total:0' shape=() dtype=float32, numpy=3.0>
<tf.Variable 'count:0' shape=() dtype=float32, numpy=3.0>
<tf.Variable 'total:0' shape=() dtype=float32, numpy=3.0>
<tf.Variable 'count:0' shape=() dtype=float32, numpy=3.0>
<tf.Variable 'total:0' shape=() dtype=float32, numpy=3.0>
<tf.Variable 'count:0' shape=() dtype=float32, numpy=3.0>
and these are the index weights:
<tensorflow.python.keras.engine.base_layer_utils.TrackableWeightHandler object at 0x7fe999901f70> <tf.Variable 'embedding_4/embeddings:0' shape=(3, 2) dtype=float32, numpy=
array([[-0.001, -0.037],
[-0.058, 0.031],
[-0.019, 0.01 ]], dtype=float32)>
<tf.Variable 'identifiers:0' shape=(3,) dtype=string, numpy=
array([b'Sustainability', b'Dance', b'Books'], dtype=object)>
<tf.Variable 'candidates:0' shape=(3, 2) dtype=float32, numpy=
array([[-0.004, -0.012],
[-0.012, 0.029],
[-0.03 , -0.02 ]], dtype=float32)>
As we are commonly seeing all recommendations are the same:
Top 3 recommendations for user 1: [[b'Books' b'Dance' b'Sustainability']]
thing: [[ 0.029 0.029 -0.054]]
Top 3 recommendations for user 2: [[b'Books' b'Dance' b'Sustainability']]
thing: [[ 0.029 0.029 -0.054]]
what I'd really like to understand is which matrix operation(s) give us this output [ 0.029 0.029 -0.054]
I'm looking at the code in the layers.factorised_top_k.py
which I think is generating the results:
scores = tf.linalg.matmul(queries, self._candidates, transpose_b=True)
values, indices = tf.math.top_k(scores, k=k)
return values, tf.gather(self._identifiers, indices)
I'm running the matmul in my own note book like so:
scores = tf.linalg.matmul(model.user_model(np.array(["1"])), index._candidates, transpose_b=True)
print(scores)
which gives
tf.Tensor([[ 0.045 -0.019 -0.033]], shape=(1, 3), dtype=float32)
okay, this looks very suspicious:
query1 = model.user_model(np.array(["1"]))
print(query1)
query2 = model.user_model(np.array(["2"]))
print(query2)
tf.Tensor([[-0.001 -0.037]], shape=(1, 2), dtype=float32)
tf.Tensor([[-0.001 -0.037]], shape=(1, 2), dtype=float32)
so something must be wrong with the user model?
Please make sure your user ids are looked up in the vocabulary correctly, instead of all mapping to OOV. This is by far the most plausible explanation.
right, I will dig in further, but I am using identical code to the example (for managing and looking up user ids in the vocabulary), and also when one runs the moveilens data on a smaller subset of the 10k we get the same behaviour with the quickstart code (all users get the same recommendations). I wonder if for some smaller datasets or particular distributions the user model fails to break symmetry ...?
When I say something wrong with user model I don't mean your underlying code is wrong - but more that I'm just not training it correclty, e.g. insufficient data. If there was a simple fix involving vocabulary lookup that would be great ... I will redouble my efforts to find such.
okay fixed it - gosh so silly:
thing, titles = index(np.array([user_ids_vocabulary.get_vocabulary()[1]]))
print(f"Top 10 recommendations for user {user_ids_vocabulary.get_vocabulary()[1]}: {titles}")
print(f"thing: {thing}")
thing, titles = index(np.array([user_ids_vocabulary.get_vocabulary()[2]]))
print(f"Top 10 recommendations for user {user_ids_vocabulary.get_vocabulary()[2]}: {titles}")
print(f"thing: {thing}")
thing, titles = index(np.array([user_ids_vocabulary.get_vocabulary()[3]]))
print(f"Top 10 recommendations for user {user_ids_vocabulary.get_vocabulary()[3]}: {titles}")
print(f"thing: {thing}")
Top 10 recommendations for user 990000123458: [[b'Graphic design' b'Public relations' b'Sculpture' b'Live music'
b'Sustainable' b'Religion' b'Menswear' b'TV' b'Musical' b'Dance']]
thing: [[2.602 2.331 2.126 1.528 1.526 1.361 1.287 1.059 0.997 0.82 ]]
Top 10 recommendations for user 990000123457: [[b'Personal Health' b'Beach holidays' b'Mindfulness' b'Gardening'
b'Volunteering' b'Outdoor activities' b'Design' b'Mental Health'
b'Adventure breaks' b'Film']]
thing: [[3.81 3.46 3.331 2.941 2.929 2.679 1.888 1.834 1.709 1.66 ]]
Top 10 recommendations for user 990000123456: [[b'Religion' b'Comedy' b'Musical' b'Live music' b'Investment' b'Couture'
b'Comics' b'Education' b'Dance' b'Mindfulness']]
thing: [[4.629 4.528 3.428 2.972 2.891 2.792 2.374 2.37 2.327 2.056]]
I had assumed that in example the queries as:
np.array(["1"])
represented the 1st user, when this is actually representing user id "1". I just had to look up the correct user ids e.g.
np.array(["990000123456"])
and we get the expected behaviour ... but this has been very educational in understanding a lot more about the system. I'm still not quite seeing which matrix multiplications are leading to particular outputs, but I'm well on the way.
@GaetanDu your input about the uniqueness in the vocabulary was critical @maciejkula thanks for your input on this and for making the whole framework available
I wonder if it would be worth adding a note to the quick start docs about the user-ids? or perhaps having the first lookup be for a user id that couldn't be confused with an index?
put in a tiny PR to highlight the nature of the user id lookup https://github.com/tensorflow/recommenders/pull/347
and just to summarise my understanding now that I've run on a small super simple example, with embeddings using k=2 after 3 epochs training:
The user id embedding maps user ids onto 2 dims, e.g.
[ 0.016, -0.027] ==> UNK
[ 0.454, -0.095] ==> 123456
[-0.44 , 0.132] ==> 456788
The interest (in my data set) embedding maps interests onto 2 dims
[ 0.258, -0.051] ==> Dance
[ 0.262, -0.019] ==> Books
[-0.519, 0.144] ==> Sustainability
Then calculating suggested interests involves selecting a user, and doing a matrix multiplication to assess level of interest in each subject, which can then be ranked ...
The weights get randomly initialized to small values, and after training the weights for the UNK items are the same, but the other weights have increased by approximately an order of magnitude ... and I guess in terms of explainability we have that embedding dimensions representing some feature of the data, so for example we might say that in the above example that user 123456 is 0.454 in terms of feature A and -0.095 in terms of feature B and that Dance and Books are more feature A than B and vice versa for sustainability, so user 123456 matches Dance and Books more based on a feature A "connection" ...
if that's roughly correct I want to move on to better understanding the objective function tfrs.tasks.Retrieval ...
@williamberrios was using indices instead of ids your problem too?
Hi all, reopening this thread because I'm finding that using the basic_ranking tutorial with the movielens100k dataset produces identical movie rankings for every user. The predicted ratings vary from user to user, but the overall order remains the same. I tried with the movielens1m dataset and found that while the rankings now varied slightly, the same 5 movies were predicted to be in every user's top 5. I tried this model with my own dataset of 200k users and 5k items and found the exact same problem. It seems like the model is collapsing to the mean and becoming a variant of a "most popular recommender" that takes ratings into account.
I've tried varying the learning rate, layer regularization, dropout, reducing to one layer, and even removing the activation function to essentially make it a linear model. Regardless, it outputs the same ranking for every user.
However, I tried running all of these datasets through various matrix factorization based collaborative filtering algos like Alternating Least Squares and Bayesian Personalized Ranking and found that each user had unique and relevant recommendations, so I don't think its a dataset problem.
Finally, I am referencing user_ids directly from unique_user_ids
so it's not an indices problem.
Any insight is appreciated 🙏 Jason
Having successfully got my data into the correct format I've been able to run the quickstart recommender on my own dataset. However for some reason on a given run every user gets identical recommendations for the same item over and over again.
Looking at the movielens data in the same I see that it has the following properties:
My own data set has the following properties
I think I've gotten the data into the correct format. For example, the movie data mappings are like this:
while I've now got the interest data similarly structured like so
however the recommended interests with my dataset are locked into the same thing each time.
Running with the movielens data there's a nice spread of recommendations, and the training output is like this:
and we can see a nice pattern of recommendations for users:
but for the interests data we get this:
and here's the details for the first two users:
On different runs the recommendation might recommend a different interest, but it always gets stuck with the same one for all users. I've tried longer training runs, but I'm starting to wonder if there's a need to have some minimum number of "ratings" per user? I think that's the main difference between the data sets, i.e. 100 ratings for each user in the movielens dataset, but only five in my interests dataset. Or could it be the number of possible interests is just too small?
Or I'm making some stupid mistake in the code (in the interest vocabulary lookup table perhaps?). What are the requirements on the dataset size to allow this model to work?
Is there anything in the output I show above to indicate what's going wrong? The net getting stuck in a local minima perhaps?
I've tried longer training runs but that doesn't seem to make any difference. Perhaps the settings for the retrieval model need to be different given the fewer "ratings" or some other difference in the dataset proportions?