Closed skepsun closed 2 years ago
Hi @skepsun,
I am very grateful that you have discovered and shared this with us! Thank you so much!
Can you share the exact settings (e.g. python gnn.py --num_layers ...
) you used for reaching 70+%?
If you want you can implement the bugfix in a PR, otherwise I am happy to do it :)
Hi @jqmcginnis , I just made a PR to fix it. The command to reproduce possible 70+% results is:
python gnn.py --hidden_channel 3 --num_layer 2 --dropout 0.5 --lr 0.000001 --epochs 100
I also discovered that exchanging positive and negative edges in training (by using pos_loss = -torch.log(1-pos_out + 1e-15).mean()
and neg_loss = -torch.log(neg_out + 1e-15).mean()
) will not significantly affect the final results...
And training scores become lower/higher when val&test scores become higher/lower.
I also discovered that exchanging positive and negative edges in training (by using
pos_loss = -torch.log(1-pos_out + 1e-15).mean()
andneg_loss = -torch.log(neg_out + 1e-15).mean()
) will not significantly affect the final results... And training scores become lower/higher when val&test scores become higher/lower.
@weihua916 do you see this as a viable option? I am also unsure how to deal with this situation in the best possible way. As you are very experienced and accomplished in this field, I would very much appreciate your thoughts on this.
To provide some more background, let us consider the output of the following script:
python gnn.py --hidden_channel 128 --num_layer 2 --dropout 0.0 --lr 0.0001 --epochs 100
Using the non-inverted training process we obtain:
Run: 01, Epoch: 01, Loss: 1.3865, Train: 0.6315, Valid: 0.2660, Test: 0.2671
Run: 01, Epoch: 02, Loss: 1.3862, Train: 0.6312, Valid: 0.2660, Test: 0.2671
Run: 01, Epoch: 03, Loss: 1.3857, Train: 0.6305, Valid: 0.2662, Test: 0.2672
...
Run: 01, Epoch: 10, Loss: 1.3191, Train: 0.6269, Valid: 0.2680, Test: 0.2689
Run: 01, Epoch: 11, Loss: 1.3131, Train: 0.6269, Valid: 0.2680, Test: 0.2689
Run: 01, Epoch: 12, Loss: 1.3080, Train: 0.6269, Valid: 0.2679, Test: 0.2689
Run: 01, Epoch: 13, Loss: 1.3028, Train: 0.6269, Valid: 0.2679, Test: 0.2689
Run: 01, Epoch: 14, Loss: 1.2975, Train: 0.6269, Valid: 0.2679, Test: 0.2689
Run: 01, Epoch: 15, Loss: 1.2917, Train: 0.6269, Valid: 0.2679, Test: 0.2689
Run: 01, Epoch: 16, Loss: 1.2855, Train: 0.6269, Valid: 0.2679, Test: 0.2689
...
Run: 01, Epoch: 21, Loss: 1.2502, Train: 0.6675, Valid: 0.2918, Test: 0.2928
Run: 01, Epoch: 22, Loss: 1.2438, Train: 0.6659, Valid: 0.3057, Test: 0.3067
Run: 01, Epoch: 23, Loss: 1.2385, Train: 0.6681, Valid: 0.2975, Test: 0.2985
Run: 01, Epoch: 24, Loss: 1.2339, Train: 0.6690, Valid: 0.2999, Test: 0.3009
Run: 01, Epoch: 25, Loss: 1.2301, Train: 0.6705, Valid: 0.3072, Test: 0.3082
Run: 01, Epoch: 26, Loss: 1.2271, Train: 0.6709, Valid: 0.3037, Test: 0.3047
Run: 01, Epoch: 27, Loss: 1.2246, Train: 0.6708, Valid: 0.3040, Test: 0.3050
Run: 01, Epoch: 28, Loss: 1.2225, Train: 0.6714, Valid: 0.3087, Test: 0.3098
Run: 01, Epoch: 29, Loss: 1.2209, Train: 0.6722, Valid: 0.3103, Test: 0.3114
Run: 01, Epoch: 30, Loss: 1.2195, Train: 0.6718, Valid: 0.3084, Test: 0.3094
Run: 01, Epoch: 31, Loss: 1.2183, Train: 0.6728, Valid: 0.3098, Test: 0.3108
Run: 01, Epoch: 32, Loss: 1.2173, Train: 0.6721, Valid: 0.3118, Test: 0.3128
Run: 01, Epoch: 33, Loss: 1.2164, Train: 0.6723, Valid: 0.3129, Test: 0.3139
...
and the same script and settings for inverted loss @skepsun recommended yields the following output:
Run: 01, Epoch: 01, Loss: 1.3871, Train: 0.3686, Valid: 0.7340, Test: 0.7329
Run: 01, Epoch: 02, Loss: 1.3863, Train: 0.3688, Valid: 0.7340, Test: 0.7329
Run: 01, Epoch: 03, Loss: 1.3857, Train: 0.3694, Valid: 0.7338, Test: 0.7328
...
Run: 01, Epoch: 09, Loss: 1.3286, Train: 0.3731, Valid: 0.7320, Test: 0.7310
Run: 01, Epoch: 10, Loss: 1.3176, Train: 0.3732, Valid: 0.7320, Test: 0.7310
...
Run: 01, Epoch: 21, Loss: 1.2470, Train: 0.3319, Valid: 0.7095, Test: 0.7085
Run: 01, Epoch: 22, Loss: 1.2413, Train: 0.3320, Valid: 0.7019, Test: 0.7010
Run: 01, Epoch: 23, Loss: 1.2364, Train: 0.3318, Valid: 0.7026, Test: 0.7017
Run: 01, Epoch: 24, Loss: 1.2321, Train: 0.3303, Valid: 0.6966, Test: 0.6956
...
Interesting. It looks like the model optimization is quickly stuck in the local minima. I'd suggest you at least make sure your model overfits (nearly 1.0 ROC-AUC) towards the end of the training. Also, I feel the absolute 3D coordinate is not appropriate as input to your model. Using the relative displacements (x1,y1,z1) - (x2,y2,z2) as edge features makes more sense. In any case, this is just a baseline; for now, please make sure that the dataset itself is correct, and the community will figure out the best way to tackle this problem.
I also discovered that exchanging positive and negative edges in training (by using
pos_loss = -torch.log(1-pos_out + 1e-15).mean()
andneg_loss = -torch.log(neg_out + 1e-15).mean()
) will not significantly affect the final results... And training scores become lower/higher when val&test scores become higher/lower.
I discussed this with other students in my group, and we decided against employing this trick for the leaderboard submissions due to the following reasons:
Lastly, we love hearing your ideas and tricks to improve ogbl-vessel and its algorithms, and are happy to decide any questions you might have.
Thank you very much for the feedback!
Cheers, Julian
Thank you Julian!
It'd be cool to see on the leaderboard how SEAL performs. Also, we should keep in mind that ROC-AUC is often an optimistic measure for link prediction. You can get 99.9% ROC-AUC while achieving only 10% Hits@50. The score really depends on how difficult the negative examples are. Just good to keep this in mind when we assess the ROC-AUC score.
@weihua916 thank you very much for your comment!
I am still waiting for the final SEAL results (with 10 runs), the algorithm is comparatively slow but we're getting there :slightly_smiling_face:
Thank you very much for bringing the decision of ROC-AUC score as an evaluation metric to our attention again, we're eager to look into all these topics with ogbl-vessel, and are curious what the community thinks and implements!
@jqmcginnis Thanks for updating SEAL results!
I have a question about the train scores during training. I tried GCN without any tricks and reached (for several times, not always) 70% val/test ROC-AUC with ~35% train ROC-AUC. I also tried to implement GCN+NeighborSampler with DGL and can stably reach 73% val/test ROC-AUC with ~35% train ROC-AUC. I am very curious about whether SEAL reached 80% val/test ROC-AUC with <50% training ROC-AUC.
@skepsun happy to hear you are still working on this! :slightly_smiling_face:
Yes, SEAL_OGB is able to perform similarly well on the train set, e.g. this is the report after the first training epoch:
Command line input: python seal_link_pred_train.py --dataset ogbl-vessel --use_feature
SortPooling k is set to 10
100%|███████████████████████| 267295/267295 [1:49:55<00:00, 40.53it/s]
100%|███████████████████████| 267295/267295 [1:28:08<00:00, 50.54it/s]^B
100%|███████████████████████████| 33412/33412 [12:24<00:00, 44.89it/s]
100%|███████████████████████████| 33412/33412 [11:37<00:00, 47.91it/s]
Run: 01, Epoch: 01, Loss: 0.5186, Train: 80.76%, Valid: 80.82%, Test: 80.79%
The SEAL version on the SEAL_OGB master branch does not automatically compute the training scores, however, if you would like to run it yourself and also track the training process, feel free to use the custom implementation I've implemented, i.e. ogbl-vessel
branch in my fork which also calculates the training scores :slightly_smiling_face:
I've also noticed that the OGB leaderboard has received another submission ("SAGE+JKNet") which seems to achieve similar ROC-AUC scores, so I do think the simplicity of GCN and SAGE might be the problem.
Happy to hear your feedback!
https://github.com/snap-stanford/ogb/blob/f5534d99703ab549ae4f7279f2002c6cc79041dc/examples/linkproppred/vessel/gnn.py#L139
I found that the predictor is not set to
predictor.eval()
in the test function in gnn.py, which may result in the poor performance for GNN on this dataset. Ifpredictor.eval()
is added, even with hidden size 3, the test ROC-AUC of GCN may reach 70+%, although sometimes it is stuck at 50%.