Open LucasSDresl opened 3 years ago
That looks mostly correct to me except for a few things:
weight
to a float tensortorch.unique(return_inverse=True)
.My corrections:
weight = (torch.Tensor(df['weight_of_edg'].values)).float()
c, cidx = torch.unique(input=customer_id, return_inverse=True)
v, vidx = torch.unique(input=vendor_id, return_inverse=True)
edge_index = torch.Tensor((np.vstack((cidx, vidx)))).long()
x_s = cidx.unique()
x_t = vidx.unique()
data = BipartiteData(edge_index, x_s=x_s, x_t=x_t)
data.edge_attr = weight
Looks correct ? if i want to pass node_features how i should pass it ? can u plz reference me some example (if you have) of how to pass a biparthite data into some architecture ?
Dependent on the size of your data, a natural way to handle feature-less graphs is to encode the feature matrix as an identity matrix. Does that work for you?
size of the data is not big ( i am creating a random data for learning ). I was thinking on passing 4/5 feature for vendor_id and customer_id and use some gcn or gnn archichecture for a biparthite graph ( i didnt saw in the examples section an example using biparthite graphs). Is okay if i want to recommend to customer i some vendor, treat the problem as a link prediction problem for this case ? how i should encode graph for biparthite ?
If that is the case, I would go for the identity matrix as input node feature matrix:
edge_index_T = torch.stack([edge_index[1], edge_index[0]], dim=0) # Transposed/Reversed graph.
data.customer = torch.identity(num_customers)
data.vendor = torch.identity(num_vendors)
conv1 = SAGEConv((num_customers, num_vendors), 64)
new_vendor_x = conv1((data.customer, data.vendor), edge_index).relu()
conv2 = SAGEConv((num_vendors, num_customers), 64)
new_customer_x = conv2((data.vendor, data.customer), edge_index_T).relu()
# Repeat with new_vendor_x and new_customer_x:
conv3 = SAGEConv((64, 64), 128)
new_vendor_x2 = conv3((new_customer_x, new_vendor_x), edge_index).relu()
# ...
For the final link prediction, it's a good idea to compute edge representations based on the hidden node representations after a number of convolutions:
edge_attr = torch.cat([customer_x[edge_index[0]], vendor_x[edge_index[1]], dim=-1)
prediction = MLP(edge_attr)
Thanks @rusty1s ! I have some question regarding ur last comment.
edge_attr
, vendor_x
you mean new_vendor_x
or new_vendor_x2
? here u are using the node representations for representing edge right ?data.edge_attr
i was having the magnitude of the edge ( how many times customer_id
order from vendor_id
) , how is the correct way to represent those weight into the edge representation? what i want to get from weight is if two customer order a lot of times from the same vendor, represent those two nodes more similar between them. def message(self, x_i, x_j, edge_attr):
return torch.cat([x_i, x_j, edge_attr)
It's also a good idea to include it in the final edge representation to make predictions.
Hi @LucasSDresl, I am more or less trying to do the same as you. Did you find any example on how to train with a Bipartite Graph, or do you have any implementation available?
Thank you very much!
Have you looked into our "Heterogeneous Graph Learning" tutorial?
I haven't seen it! That was super helpful.
However I still don't know how to create a model with that dataset to recommend items. In that link it gives you three ways, but i cann't realize what num_classes is in my example. I don't want to classify, and i am a bit confused with that.
Thank you
You can also take a look at our heterogeneous link prediction example, which is probably more related to the task you are trying to solve. Let me know if there exists any further questions.
That is a perfect example! Thank you very much
As I experiment with the example, more and more questions arise!
In the examples shown, how would you make a recommendation of a movie to a user? I am aware that the task is link prediction so I guess I should make a prediction of every user-movie possible edge and take the one with the highest predicted label, but I am unsure of how to do it.
I have been looking at the way the test is made:
You pass the model the initial embeddings, the connections of the graph and an edge_label_index, which i am also unsure of what it is (as it seems the same as the connections of the graph?).
Thank you very much for the patience!
The edge_label_index
denotes all the connections for which you want to obtain a prediction/rating for movie/user pairs. As such, for predicting new movies to a user, you would need to set edge_label_index
to something similar than:
row = torch.tensor([user_id] * num_movies)
col = torch.arange(num_movies)
edge_label_index = torch.stack([row, col], dim=0)
Following your advice, I have been able to provide predictions. Nevertheless, it always recommends the same movie (to all the users i have tried). Am I doing something wrong or is it because the model is simple? (I also tried trainning with 10000 epochs but results are the same)
movie_mapping = {i: idx for i, idx in enumerate(df_movies.index)}
num_movies = len(data['movie'].x)
row = torch.tensor([USERID] * num_movies)
col = torch.arange(num_movies)
edge_label_index = torch.stack([row, col], dim=0)
pred = model(data.x_dict, data.edge_index_dict, edge_label_index)
pred = pred.clamp(min=0, max=5)
idx_max = torch.argmax(pred)
movieId = movie_mapping[int(idx_max)]
Thank you very much for your help
What happens if you look at the topk
of movie predictions?
Okay! I don't really know why but it is working now, thank you very much!
Last thing, what happens if a user is not connected to any movie or vice versa?
It's embedding is not trained, and the recommendation is likely going to be random :)
Perfect! Makes sense! Thank you very much for all your help :)
Two last questions:
How could I make a recommendation for a user outside the system? I mean, imagine we have a user, which we know some links with the movies, even though it is new and has not been trained. Would the only possible way to recommend be retrain the model with the new graph?
I understand that GNN are like passing the information through neighbours. And I know there exists more or less three types of recommendation: content based, collaborative based, and hybrid. My guess would be that this system is hybrid because somehow movies adapt to users and vice versa, but I am not really sure. Is there any paper I could read on this subject?
Thank you very much! I am learning a lot, but it is very difficult
Sorry for the late reply.
Thank you very much for your response! I will get into it
❓ Questions & Help
I am new into pytorch_geometric and i am trying to use
BipartiteData
class to load data from dataframe which looks something like this:weight_of_edge
: means how many timescustomer_id
order fromvendor_id
Wanted to know if I passing correctly the data from my dataframe to the
BipartitData
class ?This are my variables defined from dataframe to fed
BipartitData
:weight = (torch.Tensor(df['weight_of_edg'].values)).long()
customer_id = (torch.Tensor(df['customer_id'].values)).long()
vendor_id = (torch.Tensor(df['vendor_id'].values)).long()
edge_index = torch.Tensor((np.vstack((customer_id, vendor_id)))).long()
Finally i passed this way:
data = BipartiteData(edge_index, customer_id, vendor_id)
data.edge_attr = weight
Is this okay? Thanks you very much ! keep with the excellent job :) !