tkipf / pygcn

Graph Convolutional Networks in PyTorch
MIT License
5.11k stars 1.22k forks source link

How to do a semi-supervised learning? #73

Open cxyccc opened 3 years ago

cxyccc commented 3 years ago

Can this code (pygcn) be used directly in transductive learning? I notice that the train loss (in train.py) is calculated as loss_train = F.nll_loss(output[idx_train], labels[idx_train]), but in paper SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS , the author says that he only calculates train loss for labeled data. I think the loss_train here is not consistent with that in paper.

Many thanks.

FranLucchini commented 3 years ago

Maybe I'm confused, but I thought idx_train was the list of IDs for labeled data. The model gets the full set of features, while we use that list to select the examples that were previously defined as labeled. Since the dataset is not only meant for semi-supervised training, they made a selection of examples to be defined as "labeled".

I did look into utils.py and the function load_data seems to take care of that in line 41.

cxyccc commented 3 years ago

Thank you so much! As you mean, the input of the model is the features of all the data and a part of the labels, and the data corresponding to this part of the labels is the training set and the validation set. The remaining unlabeled data corresponds to the test set. If there is a problem with this understanding?

FranLucchini commented 3 years ago

The input of the model is the whole features and the adjacency matrix, not the labels. A part of the labels is used to build the train set and validation set and those are used to calculate the loss in each epoch. That is shown in line 67 for training labels and 78 for validation (train.py).

As you mentioned, it seems that the remaining labels are used to build the test set.

So I would say your understanding is almost correct, except for the input of the model.

cxyccc commented 3 years ago

Thanks for your reply! So 'semi-supervised' means that the input of the model is the whole features instead of only the features of train set (which is usually used as the model input in supervised learning). In other words, the model learns the features of test set during the training process. If there is a problem with this understanding?

FranLucchini commented 3 years ago

Exactly, semi-supervised means you receive train and test features as input, but you only have the labels from the train set.

DM0815 commented 2 years ago

Exactly, semi-supervised means you receive train and test features as input, but you only have the labels from the train set.

I feel confused. Can I ask you? your means that the model use the train and test features to train model? ranther the whole feature?