mberr / torch-ppr

(Personalized) Page-Rank computation using PyTorch
MIT License
87 stars 5 forks source link

PPR for Bipartite graph in recommendation #25

Open riyaj8888 opened 1 year ago

riyaj8888 commented 1 year ago

can u provide one simple example , how to use it for bipartite graph for recommending items to for users using this PPR code.

thanks

mberr commented 1 year ago

Hi @riyaj8888 ,

I am not very familiar with the latest research on recommender systems, so you might want to take a look at that first. One particular question would be whether you need to pre-process your data to account for the sparsity of user-item interactions (e.g., cluster users or items into groups, and then operate on a densified user-cluster-item-cluster interaction matrix).

That said, this library only gives you a (fast) way to compute the (personalized) page rank, which can be interpreted as a relevance score. A quick application I can think of is to calculate PPR scores for a given user, and then recommend items that have a high score. To do this, you need to

  1. construct the adjacency matrix; there are several possible formats, but the simplest might be an edge index, see https://torch-ppr.readthedocs.io/en/stable/usage.html#torch_ppr.api.personalized_page_rank

    import torch
    edge_index = torch.as_tensor(data=[
    (user_id, item_id + num_users)
    for user_id, item_id in user_item_interactions
    ]).t()

    Note that you have to make sure that the node IDs are unique, i.e., a user with id i does not get the same node id as the item i.

  2. calculate the personalized page rank scores for a user of interest

    from torch_ppr import personalized_page_rank
    scores = personalized_page_rank(edge_index=edge_index, indices=[user_id_of_interest])

    Note that this will be a vector of size num_users + num_items, i.e., you also get scores for other users and not just items.

  3. e.g., look at the top-k items

    best_item_ids = scores[num_users:].topk(k=10, sorted=True)
riyaj8888 commented 1 year ago

Thanks, in step 3. it should be scores[num_items:] ri8?

mberr commented 1 year ago

in step 3. it should be scores[num_items:] ri8?

No. With the code above, you use the node ids [0, ..., num_users) for user nodes, and [num_users, ..., num_users + num_items) for item nodes. Thus, we look at the scores for nodes starting at num_users until the end.

You could also switch that, but then you would need to adjust step 2, too, since indices has to be a node index, i.e., you need to translate the user index to a node index first.