PPR for Bipartite graph in recommendation

riyaj8888 commented 1 year ago

can u provide one simple example , how to use it for bipartite graph for recommending items to for users using this PPR code.

thanks

mberr commented 1 year ago

Hi @riyaj8888 ,

I am not very familiar with the latest research on recommender systems, so you might want to take a look at that first. One particular question would be whether you need to pre-process your data to account for the sparsity of user-item interactions (e.g., cluster users or items into groups, and then operate on a densified user-cluster-item-cluster interaction matrix).

That said, this library only gives you a (fast) way to compute the (personalized) page rank, which can be interpreted as a relevance score. A quick application I can think of is to calculate PPR scores for a given user, and then recommend items that have a high score. To do this, you need to

construct the adjacency matrix; there are several possible formats, but the simplest might be an edge index, see https://torch-ppr.readthedocs.io/en/stable/usage.html#torch_ppr.api.personalized_page_rank
```
import torch
edge_index = torch.as_tensor(data=[
(user_id, item_id + num_users)
for user_id, item_id in user_item_interactions
]).t()
```
Note that you have to make sure that the node IDs are unique, i.e., a user with id i does not get the same node id as the item i.
calculate the personalized page rank scores for a user of interest
```
from torch_ppr import personalized_page_rank
scores = personalized_page_rank(edge_index=edge_index, indices=[user_id_of_interest])
```
Note that this will be a vector of size num_users + num_items, i.e., you also get scores for other users and not just items.

e.g., look at the top-k items

best_item_ids = scores[num_users:].topk(k=10, sorted=True)

riyaj8888 commented 1 year ago

Thanks, in step 3. it should be scores[num_items:] ri8?

mberr commented 1 year ago

in step 3. it should be scores[num_items:] ri8?

No. With the code above, you use the node ids [0, ..., num_users) for user nodes, and [num_users, ..., num_users + num_items) for item nodes. Thus, we look at the scores for nodes starting at num_users until the end.

You could also switch that, but then you would need to adjust step 2, too, since indices has to be a node index, i.e., you need to translate the user index to a node index first.

mberr / torch-ppr

PPR for Bipartite graph in recommendation #25