Closed nixror closed 2 years ago
Dear @nixror,
thanks for reaching out. Could you please elaborate a bit more your question (e.g. unfair w.r.t. to who/what?). I recall that the GNN weights are optimized only using the training error on the standard split.
Best wishes, Luca
Thank you for your reply! The inner layer loss during training is l(f_w(X, A)), where the model parameter w is optimized in the training set. However, now the adjacency matrix A needs to be optimized in the outer loss, so A is also part of the parameters in the loss, and it is optimized on the validation set.
Dear @nixror,
We received a similar question from a reviewer. Here's an extract from our rebuttal:
[In the experiments in the paper, we actually] always split the validation sets in two parts: one for learning the graph structure and the other for setting the remaining hyperparameters. The train+validation data is identical for all methods including the baselines, which use the validation set to tune the respective hyperparameters (e.g. regularization, dropout, distance measures, ... [note that different algorithms have their own sets of hyperparameters]). Since we consider the graph structure to be part of the hyperparameters, we believe this setup is justified. Splitting the training set is not fair to LDS since the GCN’s parameters are trained with less data than those of the baselines. However, following your suggestion we ran experiments on Cora/Citeseer, splitting the training set into train-new/val-new and using val-new (50% of the train set) to learn the graph structure. The results are: edges% | Cora| Citeseer 25% | 73.1 | 68.6 50% | 76.5 | 70.2 75% | 79.4 | 71.4 100%| 81.7 | 73.1 LDS still outperforms the baselines (GCN/GCN+RND) by up to 6%; greater gains are obtained when fewer edges are retained (compare with Fig 2 left & center).
Hope this helps!
Best wishes, Luca
Dear @lucfra,
Thanks for your comprehensive reply!
Best wishes!
Hello! Thanks for your excellent work! I have a question about your calculation of outer loss on validation set. Under the standard split, the validation set for cora, citeseer and pubmed is several times larger than the training set, and it seems unfair that you compute the loss on the validation set. Could you please let me know if I'm missing something?