massquantity / LibRecommender

Versatile End-to-End Recommender System
https://librecommender.readthedocs.io/
MIT License
352 stars 62 forks source link

Embedding Amount between SVDPP and FM #426

Open skylersky opened 9 months ago

skylersky commented 9 months ago

Hi I have a question about the output embedding amount between SVDPP and FM. I am now using same data to fit SVDPP and FM model, where the user count is X and item count is Y. The output embedding vectors i get from SVDPP is X and Y, using get_user_embedding and get_item_embedding. For the FM, i save the model and read back, the embedding vectors amount i get for user and item are X+1 and Y+1, i am not sure how it happens.

And one more thing that I want to clarify, the output embeddings is ordered(ascending) by the user id and item id right?

massquantity commented 9 months ago

In fact, both SVDPP and FM have user embedding vectors X+1 and item embedding vector Y+1 in their inner representations, and the last additional embedding vector is used for cold start user/item. But the get_user_embedding and get_item_embedding functions will remove the last vector.

The output embeddings are ordered by inner user ids and item ids, which are typically not the ids in your original data. That's because the original ids may be str like "YUJUYSD7YH67SDEW", which has no typical order.

When passing a single id, the get_user_embedding and get_item_embedding functions accept the original id, so you can use a for loop to get all the matching original ids and their corresponding embedding vectors.

original_ids = ["aa", "bb", "rr"]
id_and_embeds = {i: get_user_embedding(i) for i in original_ids}
skylersky commented 9 months ago

For SVDPP, it's ok to get embeddings using for loop with 'get_user_embedding' function. What should i do to get embeddings for FM method? I was thinking if it's possible that i first encode my user_id and item_id into ordered integer id such that the output embedding could follow the order of encoded integer id.

massquantity commented 8 months ago
  1. Get the embeddings by inner order:

    train_data, data_info = DatasetFeat.build_trainset(...)
    fm = FM(...)
    fm.fit(train_data, ...)
    user_embeds = fm.sess.run(tf.compat.v1.get_default_graph().get_tensor_by_name("embedding/user_embeds_var:0"))
    item_embeds = fm.sess.run(tf.compat.v1.get_default_graph().get_tensor_by_name("embedding/item_embeds_var:0"))
  2. Get id-embedding mapping:

    original_user_ids = ["aa", "ee"]
    user_ids_embeds = dict()
    for user_id in original_user_ids:
    inner_id = data_info.user2id[user_id]
    user_ids_embeds[user_id] = user_embeds[inner_id]