Open cxyccc opened 3 years ago
Maybe I'm confused, but I thought idx_train
was the list of IDs for labeled data. The model gets the full set of features, while we use that list to select the examples that were previously defined as labeled. Since the dataset is not only meant for semi-supervised training, they made a selection of examples to be defined as "labeled".
I did look into utils.py
and the function load_data
seems to take care of that in line 41.
Thank you so much! As you mean, the input of the model is the features of all the data and a part of the labels, and the data corresponding to this part of the labels is the training set and the validation set. The remaining unlabeled data corresponds to the test set. If there is a problem with this understanding?
The input of the model is the whole features and the adjacency matrix, not the labels. A part of the labels is used to build the train set and validation set and those are used to calculate the loss in each epoch. That is shown in line 67 for training labels and 78 for validation (train.py
).
As you mentioned, it seems that the remaining labels are used to build the test set.
So I would say your understanding is almost correct, except for the input of the model.
Thanks for your reply! So 'semi-supervised' means that the input of the model is the whole features instead of only the features of train set (which is usually used as the model input in supervised learning). In other words, the model learns the features of test set during the training process. If there is a problem with this understanding?
Exactly, semi-supervised means you receive train and test features as input, but you only have the labels from the train set.
Exactly, semi-supervised means you receive train and test features as input, but you only have the labels from the train set.
I feel confused. Can I ask you? your means that the model use the train and test features to train model? ranther the whole feature?
Can this code (pygcn) be used directly in transductive learning? I notice that the train loss (in train.py) is calculated as
loss_train = F.nll_loss(output[idx_train], labels[idx_train])
, but in paper SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS , the author says that he only calculates train loss for labeled data. I think theloss_train
here is not consistent with that in paper.Many thanks.