Why compare the supervised OGC with other semi-supervised models?

Kwongfei commented 6 months ago

Hallo. I appreciate your effort to analyse GCN from an optimization perspective and try to design GCN from a new perspective.

However, I noticed that OGC is a supervised method (mentioned in the abstract) since it involves the label information Y to update U (see eq.6). Therefore, I think that it is unfair to compare this supervised method with other semi-supervised models that use only 20 labels per class for training in Table 1.

Furthermore, research [1] shows (in Table 1) that GCN with ground true labels can achieve 100% accuracy on cora, citeseer, and pubmed datasets. Hope this information can help you.

[1] Topology Optimization based Graph Convolutional Network

zhengwang100 commented 6 months ago

In OGC, when update U, only the train and val part of the labeled set is used. In other words, only Y[train+val, :] is used, not all the information of the ground truth label Y.

In the main_ogc.py, we use the mask "idx_train_val" to only use Y[train+val, :]. In the PyG offical code, it uses "data.trainval_mask" to make sure this. Therefore, OGC is a semi-supervised method which adopt both the train and val label part for model learning.

zhengwang100 commented 6 months ago

For the second question, in [1], it describes that "GCN-GT employs the ground truth membership matrix" in which the matrix size is [N*N]. It seems like GCN-GT use all the information of the ground truth label Y.

Hope this will help you.

Kwongfei commented 6 months ago

Thanks for your reply.

Your method is impressive. But does that mean you use the labels from the validation dataset for training? GCN only use the labels from the training set for backward propagation, such as the code of GCNII:

loss_train = F.nll_loss(model()[data.train_mask], data.y[data.train_mask])

zhengwang100 commented 6 months ago

Yes, in the implementation details, we describe that, "like LP and C&S, we use both train and val part for model learning". OGC is just a shallow method, with only one-layer trainable parameters. Therefore, it does not need lots of validation data to avoid over-fitting; in contrast, typical GNNs (like GCN and GCNII) need.

Actually, we can rethink the functionality of validation dataset in deep learning: during model learning, typical GNNs (even other neural networks like CNNs and RNNs) need huge validation part (like in Citesser, val part is almost 5x more than train part) to avoid over-fitting - Does this mean that they also adopt validation dataset for trainable parameter learning/adjusting?

In light of this, can we design novel training strategies (like the introduced LIM trick) to better utilize these huge validation datasets to train more powerful deep NNs? It may be a very promising research direction, especially for large models like ChatGPTs.

Wish to see more attempts in this direction.

Kwongfei commented 6 months ago

Thanks for your explanation. I see the description in C&S "The validation set $L_v$ is used to tune hyperparameters such as learning rates and the hidden layer dimensions for the MLP". I tried to use the label information validation set for updating the params on the Cora dataset, and I would like to share the results:

#layers	Test ACC(reported)	Test ACC(train with validation dataset)
2	$82.2\%$	$85.4\%$
4	$82.6\%$	$86.2\%$
8	$84.2\%$	$86.2\%$
16	$84.6\%$	$86.2\%$
32	$85.4\%$	$86.9\%$
64	$85.5\%$	$83.9\%$

The ACC increased significantly for shallow GCNII but decreased for deep GCNII (64 layers). It indicates that the typical GCN indeed needs the validation dataset to avoid over-fitting. Maybe some powerful shallow LLMs can be built with novel training strategies like LIM trick.

zhengwang100 / ogc_ggcm

Why compare the supervised OGC with other semi-supervised models? #1