Open DaehanKim opened 5 years ago
This is a very good question. In my experience this type of setup can work very well. We are currently preparing a report where we describe this setup in more detail. We train the model with negative sampling (for scalability reasons) and we use only a small number of training epochs -- so maybe this is what makes the difference in your case. We will release the code for these experiments soon, in case you want to compare implementations.
Yes, you are would sub-sample the negative class (non-edges) so that both classes are balanced in the binary cross-entropy loss. This is the same technique that is used in word2vec, for example.
We found that the model can sometimes discard too much information of the original node features if trained until convergence on the link prediction setting. Early stopping or some other regularizers can help alleviate this problem.
On Fri, Jul 12, 2019 at 2:35 PM DaehanKim notifications@github.com wrote:
- By 'negative sampling', do you mean balancing each class (with same number of examples) in multi-class setting?
- With a small number of training epochs, does the model converge? or model convergence doesn't matter in this case?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tkipf/gae/issues/44?email_source=notifications&email_token=ABYBYYDAO255LIJ5KC2QMZTP7B3ADA5CNFSM4ICGM6GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZZUDXY#issuecomment-510869983, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYBYYE7DJLO2DRMIU65NYLP7B3ADANCNFSM4ICGM6GA .
Thanks for the reply.
I thought that the positive weight in your implementation was to balance positive edges against negative edges. As I see it, this would serve as a sub-sampling method in a sense of making positive edges contribute to the loss as much as negative edges do. Do you think the strategy is not enough for a robust training for the node classification?
Weighting should in principle be fine too, but training dynamics can certainly be affected by whether you choose a weighting strategy or negative sampling. I would always prefer negative sampling in practice, as it is much more scalable.
On Fri, Jul 12, 2019 at 4:56 PM DaehanKim notifications@github.com wrote:
Thanks for the reply.
I thought that the positive weight in your implementation was to balance positive edges against negative edges. As I see it, this would serve as a sub-sampling method in a sense of making positive edges contribute to the loss as much as negative edges do. Do you think the strategy is not enough for a robust training for the node classification?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tkipf/gae/issues/44?email_source=notifications&email_token=ABYBYYDXFHUOPFZSQKWHTF3P7CLQ5A5CNFSM4ICGM6GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZZ7YZQ#issuecomment-510917734, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYBYYHWVXEEQISUEJ755NLP7CLQ5ANCNFSM4ICGM6GA .
Thanks for all detailed replies. I'm looking forward to the release of your reference code for node classification.
Any update on this?
Hi,
I just made a little modification on vgae to perform node classification in an unsupervised fashion.
In detail, I just used full adjacency matrix since there is no need for link prediction. Everything else is the same as the original implementation.
After training the model, I chose randomly 40% of all nodes as training set for default logistic regression module of Sklearn. (as practiced by many authors including those of 'A graph autoencoder for attributed network embedding') I used 20% as validation and 40% as test set.
In cora dataset, I found out accuracy is just 31% on validation set with several runs(random splits on dataset). I don't see what is the problem with my approach and I want your advice. Or is it just that unsupervised approach is not appropriate for node classification?
Thanks a lot.