Closed EdwardALockhart closed 2 years ago
I would suggest you think in terms of queries and candidates. A query is the context for which you want to retrieve a set of candidates. The query input features should not contain information about the target candidate that is not available at inference time.
In the example above, the user is performing a search. The query is the USER_ID
, the user's country USER_RESIDENCE
and the desired travel class CABIN_TYPE
. All of this can be provided at inference time.
So the distinction is pretty clear. You can add information about the user or the recommendation context to the query tower and you can add item features to the candidate tower.
I see, this makes a lot more sense now when you try to write the code and get recommendations for a specific user - you also have to supply the other characteristics required by the query model, some of which you couldn't possibly know at inference time such as the bought item and its characteristics.
Thanks!
Though now that I know the user and their characteristics at inference time (inputs into the Query Model), I can supply the item and its characteristics as candidates (Candidate Model). How do I produce a recommendation in this instance?
I can't seem to find an example of running this prediction stage with data other than item IDs as a list of candidates and user ID as a query. My goal here is to learn that some items are related to one another by categories and the same for users but all that should be recommended is an item.
Below is the code that I am using with some lines omitted for clarity
ratings = tf.data.Dataset.from_tensor_slices(df[['user', 'item', 'item_type', 'user_type', 'strength']].to_dict(orient = 'list'))
items = tf.data.Dataset.from_tensor_slices(df[['item', 'item_type']].drop_duplicates().to_dict(orient = 'list'))
...
class QueryModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.user_layers = user_layers
self.user_type_layers = user_type_layers
def call(self, features):
return tf.concat([self.user_layers(features["user"]),
self.user_type_layers(features["user_type"])], axis = 1)
class CandidateModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.item_layers = item_layers
self.item_type_layers = item_type_layers
def call(self, features):
return tf.concat([self.item_layers(features["item"]),
self.item_type_layers(features["item_type"])], axis = 1)
query_model = tf.keras.Sequential(QueryModel())
candidate_model = tf.keras.Sequential(CandidateModel())
retrieval_task = tfrs.tasks.Retrieval(metrics = tfrs.metrics.FactorizedTopK(items.batch(128).map(candidate_model), ks = [5, 10]))
...
class RetrievalModel(tfrs.models.Model):
def __init__(self, query_model, candidate_model, retrieval_task):
super().__init__()
self.query_model: tf.keras.Model = query_model
self.candidate_model: tf.keras.Model = candidate_model
self.retrieval_task: tf.keras.layers.Layer = retrieval_task
def compute_loss(self, features, training = False):
query_embeddings = self.query_model(features)
positive_candidate_embeddings = self.candidate_model(features)
return self.retrieval_task(query_embeddings,
positive_candidate_embeddings,
compute_metrics = not training)
# Train and test
retrieval_model = RetrievalModel(query_model,
candidate_model,
retrieval_task)
retrieval_model.compile(optimizer = tf.keras.optimizers.Adagrad(learning_rate = 0.1))
retrieval_model.fit(train,
validation_data = test,
epochs = 10)
retrieval_results = retrieval_model.evaluate(test, return_dict = True)
# Get candidate recommendations
index = tfrs.layers.factorized_top_k.BruteForce(retrieval_model.query_model)
index.index_from_dataset(tf.data.Dataset.zip((items.batch(100),
items.batch(100).map(retrieval_model.candidate_model))))
When I try to generate the recommendations, I get an error on the last line:
File "/tmp/ipykernel_3209/2589562812.py", line 1, in <cell line: 1>
index.index_from_dataset(tf.data.Dataset.zip((items.batch(100),
File "/mnt/e0fdda2b-8695-46fc-b7ef-788e3852324c/DataG7/Computing/Python/VirtualEnvironments/tf/lib/python3.10/site-packages/tensorflow_recommenders/layers/factorized_top_k.py", line 197, in index_from_dataset
_check_candidates_with_identifiers(candidates)
File "/mnt/e0fdda2b-8695-46fc-b7ef-788e3852324c/DataG7/Computing/Python/VirtualEnvironments/tf/lib/python3.10/site-packages/tensorflow_recommenders/layers/factorized_top_k.py", line 127, in _check_candidates_with_identifiers
if candidates_spec.shape[0] != identifiers_spec.shape[0]:
AttributeError: 'dict' object has no attribute 'shape'
I can bring across both attributes by simply concatenating as the code below. Does this seem sensible? From https://github.com/tensorflow/recommenders/issues/318#issuecomment-1102099467. I fear this might impact on the ranking stage later when these recommendations are re-ranked due to their format.
index = tfrs.layers.factorized_top_k.BruteForce(retrieval_model.query_model)
index.index_from_dataset(items.batch(100).map(lambda x: (x['item'] + x['item_type'],
retrieval_model.candidate_model(x))))
I can get out a recommendation as item + item_type for a single user with characteristics using
query = {"user": tf.constant(["user_MarcGimbel"]), "user_type": tf.constant(["United States"])}
affinity_scores, recommended_items = index(query)
Supplying user and user_type is no problem for inference, but I am unsure about the candidate model using item and item_type as I can't think of how else it can learn that they are related despite my desire for just recommending items without the item_type information.
I'm quite new to TensorFlow, so the fact that I'm at this stage is a testament to how good the tutorials are, just some help is required with niche areas that aren't covered.
Thanks
Yep, exactly. Thinking about how you intend to use the model typically can clear up what features you can use and where.
You are misunderstanding the index_from_dataset method.
You need to map your dataset so that it consists of tuples (item_id, item_vector)
, as follows.
index = tfrs.layers.factorized_top_k.BruteForce(retrieval_model.query_model)
index.index_from_dataset(items.batch(100).map(
lambda inputs: (inputs['item'], retrieval_model.candidate_model(inputs))
))
Thanks for this. Because I'm supplying metadata with my items in the form or item + item_type... when I map items like you suggest then the index obviously contains duplicates. So for a given query the same item can appear multiple times as the item_type is omitted which would have allowed their differentiation.
Is there no way to supply items with metadata to the model while just getting unique items out? Or do you have to supply metadata (fudging the index_from_dataset bit as I did) or just supply items as you have suggested and de-duplicate the final list?
The id is used for nothing more than identifying the recommended item. It should be unique for each candidate. I don't fully understand how you can have the same item with different type, but if you need to create a new composite id to make it work then that's fine.
I completely understand now. My items were a higher category (think airlines) and my items types were a level below (seat types under an airline category). So since I was selecting the higher level as my item, I captured all of the different combinations of that with its sub types. So when I removed the subtypes the higher categories were obviously duplicated. My problem was selecting that higher category as an item... I should concatenation them and treat them as distinct items (if I wanted to predict airline and seat type) or just omit any information below my items (removing the sub types and ending up with a unique airline list if I wanted to predict that) as it wouldn’t help learn any relationship of the higher categories anyway.
Thank you so much!
Hi,
I have been going through the tutorials and one question remains which is the incorporation of user and item metadata.
I have seen examples where both user and item metadata are incorporated into the User Model (Query Model - the interchangeable names can be confusing). For example here https://github.com/drtinumohan/tfrs_amazon_dataset/blob/main/tfrs_amazon.ipynb, there are user attributes (residence) and item attributes (cabin type) being incorporated into the User Model. Why not incorporate them both or one of them in the Item Model (Candidate Model)?
Is there any standard approach detailing where such features should be incorporated and how they might fit into say the basic quick start example here? https://www.tensorflow.org/recommenders/examples/quickstart
I know that both layers will have to have vocabs produced to convert to integers, but beyond that, what rules should be followed in terms of slotting these metadata into a simple retrieval model like in the quick start example?
Thanks!