Closed Barcavin closed 10 months ago
Hi! The evaluation rule is stated as is. One can use validation edges for both training and inference as long as all hyper-parameters are selected based on validation edges (not test edges). As you rightly pointed out, our example code indeed only uses the validation set for inference, but it is just for simplicity. Your example code is totally valid, but it's a bit interesting to see you are validating on validation edges while also using validation edges as training supervision. So you are essentially using training loss to do model selection? Wouldn't that cause serious over-fitting?
I think overfitting may not be an issue for this or 2000 epochs training has not reached overfitting yet. More indepth analysis may be needed. I also find it quite interesting that this naive method can get such a good performance.
If the results can be reproduced, should the leaderboard get updated accordingly?
On Sat, Sep 2, 2023 at 1:49 PM Weihua Hu @.***> wrote:
Hi! The evaluation rule is stated as is. One can use validation edges for both training and inference as long as all hyper-parameters are selected based on validation edges (not test edges). As you rightly pointed out, our example code indeed only uses the validation set for inference, but it is just for simplicity. Your example code is totally valid, but it's a bit interesting to see you are validating on validation edges while also using validation edges as training supervision. So you are essentially using training loss to do model selection? Wouldn't that cause serious over-fitting?
— Reply to this email directly, view it on GitHub https://github.com/snap-stanford/ogb/issues/457#issuecomment-1703718766, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNQQIWMYC6CEIVMK3OMIXTXYLCHBANCNFSM6AAAAAA4HH2YL4 . You are receiving this because you authored the thread.Message ID: @.***>
Got it. Thanks for clarifying. Please feel free to submit to our leaderboard yourself.
Hi,
According to the rule of evaluation (https://ogb.stanford.edu/docs/leader_rules/#:~:text=The%20only%20exception,the%20validation%20labels.), the Collab for link prediction allows using the validation set during the model training. However, the example code in (https://github.com/snap-stanford/ogb/blob/master/examples/linkproppred/collab/gnn.py) seems to only use the validation set for inference rather than training. After using these validation sets as the training edges, the performance of vanilla SAGE can achieve 68+ in Hits@50.
The implementation can be found here (https://github.com/Barcavin/ogb/tree/val_as_input_collab/examples/linkproppred/collab). In fact, GCN can reach 69.45 ± 0.52 and SAGE can reach 68.20 ± 0.35. The differences between this implementation and the original example code are:
I believe the most critical trick to make the model perform well is the learnable node embedding rather than the node attributes. To reproduce, please run
python gnn.py --use_valedges_as_input [--use_sage]
Therefore, I am confused about what the correct way is to evaluate model performance on Collab.
Besides, I found that some of the submissions on the leaderboard of Collab utilize the validation set as training edges (both supervision signal and message-passing edges) while others use it only for inference (message-passing edges). This may cause an evaluation discrepancy for these models. For example, the current top-1 (GIDN@YITU) uses validation sets in the training, while ELPH uses the validation set only for inference.
Thus, I believe a common protocol for evaluating models on Collab needs to be placed for a fair comparison.
Thanks,