vuptran / graph-representation-learning

Autoencoders for Link Prediction and Semi-Supervised Node Classification (DSAA 2018)
MIT License
252 stars 67 forks source link

Link prediction #3

Open abensaid opened 5 years ago

abensaid commented 5 years ago

Hi, For link prediction, the last layer is linear. So the output is in the set R. However, the predicted adjacency of a node i should be in {0,1}. I suppose the output of the last layer should be sofmatxed or use sigmoid instead of linear. Any thoughts??

vuptran commented 5 years ago

Yes, you are right, the last layer is linear and the input space is in [0,1]. During training, I used the cross-entropy loss function that normalizes raw logits output via internal sigmoid function, see here https://github.com/vuptran/graph-representation-learning/blob/master/longae/models/ae.py#L23

During evaluation, I used AUC metric, which only ranks output scores and does not care about the actual values of the outputs. For applications on test data though, you would want to apply a sigmoid function on top of raw logits to normalize outputs to calibrated probabilities.

idanh commented 5 years ago

@vuptran Thanks for your answer. I'm not OP but interested in this library. Can you please expend on: "you would want to apply a sigmoid function on top of raw logits to normalize outputs to calibrated probabilities."

As I understand it (and I think I'm wrong.), you mean apply sigmoid(RecAdj_[v1, v2]) where RecAdj is the reconstructed adjacency matrix?

Thanks!

vuptran commented 5 years ago

Yes, after training the autoencoder, you can use the model's prediction method to get the predicted logits, which can be transformed to be in [0, 1] with a sigmoid function.

theresiabudiman commented 3 years ago

Hello. I was trying to implement the library but made a few changes to the code such that the input matrix is not symmetric. I have applied the sigmoid function to the reconstructed matrix after training as I want the F1 score instead of the AUC and AP score. I managed to get some sort of values that are between 0 and 1. Do I just apply a threshold of 0.5 to determine whether the link exists or not? (so that if the value is >0.5, then a link should exist. otherwise, there shouldnt be a link). Thank you in advance

vuptran commented 3 years ago

Yes, that should do it. You can also change the threshold depending on the level of desired precision vs. recall.