muhanzhang / pytorch_DGCNN

PyTorch implementation of DGCNN
MIT License
372 stars 122 forks source link

Question about accuracy #19

Closed ChisamXz closed 3 years ago

ChisamXz commented 5 years ago

Hi, I have a question about how to report the accuracy. Through experiments, you reported the accuracy of the convergence result after a fixed number of epochs. For example, your code runs MUTAG for 300 epochs and reports the accuracy of the 300th epoch for each fold and averages over 10 fold, which is eventually around 85%. Actually, if we adopt the early stop and only report the best accuracy for each fold, we could achieve the accuracy of 91.6%. I'm curious why didn't you do that to get a better result.

muhanzhang commented 5 years ago

Hi, the reason is because I found using a validation fold to determine the stopping condition is not suitable here. Since the graph datasets are typically very small, a validation fold won't be representative enough to determine the optimal stopping epoch for test fold. If you have other better early stopping strategies, please share with me here. Thanks!

ChisamXz commented 5 years ago

Hi, for each fold, I just recorded every best accuracy and if it couldn't get a higher accuracy after 150 epochs, stoped it. It's very simple and I'm not sure if its convincing enough.

ChisamXz commented 5 years ago

BTW, when I used bsize=1 on MUTAG, I could still only get the accuracy of 83.3% while you could get a result of 86.1% . Did you use this pytorch version code or did you make any changes which is not shown here? Thank you in advance~

muhanzhang commented 5 years ago

No, you can't do this. You cannot report the best test accuracy across all training epochs, as it is essentially using the test data as the validation data to determine the stopping epoch. Although the code prints the testing accuracy for every epoch, you should only use the test data once in practice which is used to evaluate your final model performance. Check 5.3 Data Snooping.” Learning from Data: a Short Course" for more details.

muhanzhang commented 5 years ago

For your second question, any cuda version/pytorch version/numpy version differences between your machine and mine could lead to differences of results. That is why I suggest you give up MUTAG -- it is too small and the result variances can be too large. That is also why I suggest doing 10 series of 10-fold cross validation, and report the average accuracy of the 100 runs -- to reduce the result variances on small datasets.

ChisamXz commented 5 years ago

Oh, thank you so much for the answering. I was a little confused by the early stopping because the code of GAT used it. The MUTAG indeed has a large accuracy variance. I was trying to do experiments using the same hype-parameters as yours. However, when I used bsize = 2, I could get the result around 85.5%. Your reply has already solved my questions, thank you again for the patient answering.