Open sasan73 opened 3 years ago
Are you sure your loss computation is correct? It looks like you are taking the sum over all examples, while what I believe you actually want to do is to sum in the last dimension (and taking the mean of it after the sigmoid call). In case, your loss formulation is correct, you may want to try increasing the ratio of negative samples to force the model into distinguished embeddings.
Have you solved this problem? I used AM-softmax loss + GATConv and encounter the same problem. The 3D embedding for 48 nodes of a graph are identical.
tensor([[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965],
[-59.7394, 1.0199, 0.6965]], device='cuda:1',
grad_fn=
The alpha looks quite odd. It seems that the attention between two concatenated nodes embedding is not working.
Hello and thank you for this great library,
I am working on a recommendation problem, and have implemented a graph neural network algorithm. For optimization, I have chosen The Bayesian Personalized Ranking method which essentially tries to maximize the difference between the users that have interacted with an item and users that have not. So the new loss function looks like this:
u are the users. p and n are the positive (interacted) and the negative (did not interact) items.
In order to transform this task into a minimization problem, we have to compute the negative values of the loss and then use an optim method (e.g. gradient descent)
However, while training, the feature embeddings for each user and item become more similar after each epoch. in the end I have this:
So here is what I think is happening. The reason that the embedding vectors are becoming so much alike is that the model is actually minimizing the loss function (or the difference between the positive and negative items), hence, we get an embedding matrix that has similar rows. However, my confusion is that when I have computed the negative of the loss the model should try to maximize the difference.
Here is my code:
Here is how the model is defined I used Pytorch Geometric to calculate the message passing in a graph convolution network.