tensorflow / recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
Apache License 2.0
1.83k stars 274 forks source link

How to add more features to the item_model #356

Open Jobo-RS opened 3 years ago

Jobo-RS commented 3 years ago

I added new features to the user model according to the method in the tutorial, but there are errors when using the same method to add new features to the item model. What is the reason, or how to add more item features to the item model. The code is as follows: user_model:

`class UserModel(tf.keras.Model):

def __init__(self):
    super().__init__()

    # user_embedding用户id层
    self.user_embedding = tf.keras.Sequential( [
        tf.keras.layers.experimental.preprocessing.StringLookup(
            vocabulary=unique_user_ids, mask_token = None),
        tf.keras.layers.Embedding(len(unique_user_ids) + 1, 32),
    ])
    # 时间戳特征层
    self.timestamp_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.Discretization(timestamp_buckets.tolist()),
        tf.keras.layers.Embedding(len(timestamp_buckets) + 1, 32),
        ])

    self.normalized_timestamp = tf.keras.layers.experimental.preprocessing.Normalization()
    self.normalized_timestamp.adapt(bhv_time)

    # 购买能力特征
    self.normalized_age = tf.keras.layers.experimental.preprocessing.Normalization()
    self.normalized_age.adapt(bhv_value)

def call(self, inputs):
    # 输入为字典类型
    return tf.concat([
        self.user_embedding(inputs['user_id']),
        self.timestamp_embedding(inputs['bhv_time']),
        self.normalized_timestamp(inputs['bhv_time']),
        self.normalized_age(inputs['bhv_value']),
    ], axis = 1)`

item_model:

`class ItemModel(tf.keras.Model):

def __init__(self):
    super().__init__()

    max_tokens = 1000 # 最大标签数

    self.title_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.StringLookup(
            vocabulary = unique_item_titles, mask_token = None),
        tf.keras.layers.Embedding(len(unique_item_titles) + 1, 32),
    ])

    self.title_vectorizer = tf.keras.layers.experimental.preprocessing.TextVectorization(
        max_tokens = max_tokens) # 文本转换为向量

    self.title_text_embedding = tf.keras.Sequential([
        self.title_vectorizer,
        tf.keras.layers.Embedding(max_tokens, 32, mask_zero = True),
        tf.keras.layers.GlobalAveragePooling1D(), # 全局均值池化
    ])
    # self.title_vectorizer.adapt(titles)

def call(self, inputs):
    # 输入为字典类型
    return tf.concat([
        self.title_embedding(inputs['item_id']),
        self.title_text_embedding(inputs['title']),
    ], axis = 1)`

`class ItemlensModel(tfrs.models.Model):

def __init__(self,):
    super().__init__()
    # 查询模型
    # self.query_model =  UserModel()
    self.query_model = tf.keras.Sequential([
        UserModel(),
        tf.keras.layers.Dense(32)
    ],name = 'query_name')

    # 候选者模型
    # self.candidate_model = ItemModel()
    self.candidate_model = tf.keras.Sequential([
        ItemModel(),
        tf.keras.layers.Dense(32)
    ])

    # 任务
    self.task = tfrs.tasks.Retrieval(
        metrics = tfrs.metrics.FactorizedTopK(
            # candidates = items.batch(128).map(self.candidate_model),
            candidates = items.batch(128).map(self.candidate_model),
        )
    )

# 计算损失函数
def compute_loss(self, features, training = False):
    query_embeddings = self.query_model({
        'user_id': features['user_id'],
        'bhv_time': features['bhv_time'],
        'bhv_value': features['bhv_value'],
    })

    item_embeddings = self.candidate_model({
        'item_id': features['item_id'],
        'title': features['title'],
    })

    return self.task(query_embeddings, item_embeddings)`

There is no problem with the user model part, and the following error occurs in the item model part: image

markharding commented 3 years ago

I also ran into this issue, and managed to fix it by making items a dict like:

items = items.map(lambda x: {
    "item_id": x['item_id'],
    "item_title": x['item_title'],
}).cache()
Jobo-RS commented 3 years ago

I've solved this problem,I have solved this problem. The main problem lies in the early data format. The commodity data needs to be converted into a dictionary first, and then into a dataset (if your data format is dataframe)

data_item = pd.read_csv('../data/item_title_new.csv', nrows = 10000, encoding = 'utf-8')
# 商品属性特征,  数据格式转换,DF->dataset
items = tf.data.Dataset.from_tensor_slices(dict(data_item))
titles = tf.data.Dataset.from_tensor_slices((data_item['title']))

@markharding

deeplearningnrs commented 3 years ago

I've solved this problem,I have solved this problem. The main problem lies in the early data format. The commodity data needs to be converted into a dictionary first, and then into a dataset (if your data format is dataframe)

data_item = pd.read_csv('../data/item_title_new.csv', nrows = 10000, encoding = 'utf-8')
# 商品属性特征,  数据格式转换,DF->dataset
items = tf.data.Dataset.from_tensor_slices(dict(data_item))
titles = tf.data.Dataset.from_tensor_slices((data_item['title']))

@markharding

How did you solve? I have same problem using usermodel, I try to add location feature and get error

siyu1992 commented 2 years ago

@deeplearningnrs 方便指导一下吗,

我在item侧增加了特征以后,evaluate的矩阵就完全不起作用了,你遇到过类似的情况嘛

shainaraza commented 2 years ago

yes it didnt work either

hugoferrero commented 2 years ago

I also ran into this issue, and managed to fix it by making items a dict like:

items = items.map(lambda x: {
    "item_id": x['item_id'],
    "item_title": x['item_title'],
}).cache()

yes, i solved it that way too. items needs to have same variables that item tower.

rohitverma92outlook commented 1 year ago

Hi @hugoferrero I tried your suggestion but didn't worked for me, may be I understanding is wrong. Can you please elaborate your suggestion and give more explain with code.