How to train it as semi-supervised model - Githubissues

tkipf / gcn

Implementation of Graph Convolutional Networks in TensorFlow

MIT License

7.12k stars 2k forks source link

How to train it as semi-supervised model #16

Open sandeepsingh opened 7 years ago

sandeepsingh commented 7 years ago

Hi,

I have allx but i do not have ally (tagged labels) in this case how to make model train in semi-supervised (inductive learning) as mentioned in https://github.com/kimiyoung/planetoid way. Please help me on where all to make changes.

Thanks

tkipf commented 7 years ago

The current setup of this code release is optimized for transductive learning. For inductive learning, have a look at this recent paper: https://arxiv.org/abs/1706.02216

Nonetheless you can do inductive learning with our code as well, but it requires a few minor adjustments (exclude unlabeled test set nodes from training set).

sandeepsingh commented 7 years ago

Thanks for the quick reply.

You mean remove the nodes/feature vectors from allx matrix which does not have one-hot labels i.e, in ally.

Just in case restating the problem: I have allx - the feature vectors of both labeled and unlabeled training instances (a superset of x), But i do not have one-hot labels for all the nodes/feature vectors of allx i.e, ally - the one-hot labels for instances in allx. so now in this case what should i do?

I do have x(training instances feature vectors) and y (one-hot label for training instances) this is very small number 10% of allx.

tkipf commented 7 years ago

What you describe sounds like a transductive learning problem, i.e. you should be able to use our code 'out of the box' with no or only minor modifications. Just make sure to mask out all unlabeled nodes in the loss function (should you train in full-batch mode, i.e. not using mini batches).

YanlinQi commented 6 years ago

Hi, i also have doubts about how to prepare my data when i use the codes for a transductive learning problem. First , can the codes be used directly for a transductive learning problem mentioned in your paper? Second, since i only have part of the labels, so how should i prepare for the unlabeled data(especially for y input )? Thanks so much

tkipf commented 6 years ago

Yes -- and for the unlabeled data you can provide 0-vectors as labels (i.e. all 0s). Just make sure these are masked/skipped in the loss function.

YanlinQi commented 6 years ago

Hi，Sorry to keep asking questions(i am new in this area), but i still have two questions:

i can't understand why we can use the loss of the labeled data to represent for the whole loss when use GCN?
And for my problem, i have some spatial-temporal point-distributed sequence data(observed data by some ground observation station, and for data at each time, a small number of them are labeled, while the others are not), i want to predict the unlabeled data, do you have some idea about that topic? Thanks very much and sorry again for disturbing you

tkipf commented 6 years ago

1) We simply leave unlabeled data points out of the sum (the total loss is a sum of per-node losses), thereby these do not contribute to the gradient. 2) For temporal data, you have a few options: i) Simply use an RNN in conjunction with the GCN to model the time component (either at the beginning of the model before the GCN layers or even intermixed with them), ii) If the sequence length is always the same and non-stationary, then it might make sense to concatenate the feature vectors over the sequence and directly feed them into the unmodified GCN model.

YanlinQi commented 6 years ago

Thanks a lot for your so rapid reply! Your suggestions really enlightened me and I will try to integrate your suggestions with my work. But for the 1st question, I am so sorry for describing my question unclearly. Actually, what i am confused at is about the reason why it is reasonable to do so. Why we don't need to consider the unlabeled data when we calculate the losses? Thanks again

tkipf commented 6 years ago

You can think the GCN as a method of creating node embeddings (the representation just before the classification layer at the last layer of the model). To get a node embedding for a node in the training set, we aggregate feature information from neighboring nodes (these can be labeled or unlabeled nodes, that doesn’t matter as we only care about node features at this point). In the end, we can use the classification layer to predict a label for one of our nodes. For nodes where we know the labels, we would like the predicted label to be closer to the ground truth, so we define a loss function and optimize the model parameters. We do this for all training set nodes (since these are the only ones that we have labels for). I think in this picture it becomes clearer why this is a reasonable thing to do. Hope this helps!

On 24 Apr 2018, at 15:09, whuqyl notifications@github.com wrote:

Thanks a lot for your so rapid reply! Your suggestions really enlightened me and I will try to integrate your suggestions with my work. But for the 1st question, I am so sorry for describing my question unclearly. Actually, what i am confused at is about the reason why it is reasonable to do so. Why we don't need to consider the unlabeled data when we calculate the losses? Thanks again

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/16#issuecomment-383923979, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYErHd4Gsd1TzoF1zcEV0UMJyvMCaks5tryQDgaJpZM4PUo8z.

YanlinQi commented 6 years ago

thanks very very much for your patient reply!!! Sorry to ask the last question(really sorry for that =_=): is it right that we can not evaluate the prediction accuracy on the unlabeled data when we perform transductive learning?(if my main purpose is to predict the class of unlabeled data as accurately as possible) So sorry for keeping disturbing you with so many questions!

tkipf commented 6 years ago

Even if you perform transductive learning with a GCN model the model will be able to predict in an inductive setting. The model generally has a favorable inductive bias that allows it to do that (plus we don't require knowledge of test set nodes to predict). This is only the case if you have node features other than unique one-hot vectors.

bamal commented 6 years ago

Hello Thomas, I want to use GCN for the segmentation of the cerebral cortex using fMRI images. For each voxel, has its time series representing the functional activity across time. I looked at the comment above and you mentioned to @whuqyl to that: For temporal data, you have a few options: i) Simply use an RNN in conjunction with the GCN to model the time component (either at the beginning of the model before the GCN layers or even intermixed with them)

Is there any publication you can point me to then I get more insights about this please ?
My understanding is that one way I can approach the problem is by using the correlation matrix (between time series of voxels) which can be seen as the Adjacency matrix and used to identify embeddings in the graph, then a classification layer can simply identify clusters. According to the feature matrix, I understood that it is possible to use the identity matrix (in the case of their absence) and the model is still performing well. Am I missing something in this ?

tkipf commented 6 years ago

Hi, Thanks for your question.

1) Here's a paper that summarizes this idea nicely: https://arxiv.org/abs/1704.06199 2) Yes, this is correct.

adityauser commented 3 years ago

Hi @tkipf , Is there any function in your GCN code so that we could directly get the embedding model. I'm quite struggling in removing the last layer to get embedding model.