Closed BearBiscuit05 closed 11 months ago
Hi @BearBiscuit05 , sorry for the late reply, I was really busy with another project for the last month...
If I guess it right, you got around 0.49 accuracy on PA right? Then the problem is on data preprocessing. As mentioned in the accuracy section of the paper, we add bidirectional edges to PA for the accuracy part because this is a common practice in leaderboard which significantly boosts accuracy. You can simply use graph = dgl.to_bidirected(graph)
during the preprocessing to fix this problem.
BTW, as validated by the code here, the sampling output of DUCATI is identical to DGL. And DUCATI does not change GNN model/optimizer. Therefore, the accuracy of DUCATI is the same to DGL and any other frameworks that use plain neighbour sampling.
Thank you very much for your response. Currently, I don't have a machine with sufficient memory to implement this. So, I would like to inquire whether a simple three-layer Sage model is sufficient to obtain results. Because I observed on the OGB leaderboard that they seem to have added MLP operations.
Yes, the results reported in the paper are obtained with a standard three-layer GraphSAGE model whose layer is a standard dglnn.SAGEConv as defined here
Thanks very much for your reply, I totally understand.
Hi, I'm sorry to ask again. With the help you provided, I successfully ran the accuracy on the transformed undirected graph PA today. However, I could only measure an accuracy of 0.55-0.56. I would like to understand the reasons for this situation. Currently, my parameters are set as follows: batch size: 1024, fanout: [10, 10, 10], hidden size: 256, dropout: 0.5.
Below gives all the hyper-parameters
Namespace(adj_budget=1.75, bs=1000, dataset='ogbn-papers100M', dropout=0.0, epochs=20, fanouts='10,10,10', lr=0.003, metric='acc', model='sage', nfeat_budget=5.25, num_hidden=256, pre_batches=100, pre_epochs=2, valbs=100, valfan='10,10,10')
Note that:
(1) According to common practice, the evaluation should be made in a per-epoch fashion. You should evaluate the model on the validation set right after each epoch's training. And you finally choose the model with the highest validation accuracy throughout this procedure for later use.
(2) the validation part use another sampler other than the training sampler, for simplicity, you can use dgl.dataloading.NeighborSampler(valfan)
(3) the accuracy is calculated with torchmetrics.functional.accuracy
When running the PA dataset and preparing to test the model's correctness, I encountered some issues. I stored the model at the end of the entry function in the run_ducati.py file. Subsequently, I tested the trained model, but it seems that my testing has some problems, and the accuracy I obtained is not correct. I would like to know how to set the parameters to achieve results similar to the paper. If you could provide me with an update to the testing code, I would greatly appreciate it. The parameters I set are fanout [15, 15, 15], and epoch: 20.