Closed island02 closed 3 years ago
Hi! It is because of the bug in pytorch-lightning in computing accuracy: see https://github.com/snap-stanford/ogb/discussions/141#discussioncomment-584011 and https://github.com/PyTorchLightning/pytorch-lightning/issues/6889 I believe batch_size=1 gives you the most accurate result, but you may need to re-implement the evaluation code until the bug in pytorch-lightning is fixed.
Thanks for the answer, and sorry it was already mentioned in the discussion.
I was also trying to turn datamodule.sizes
to a ridiculously large value like him to eliminate uncertainty. However, I was having trouble reproducing the high accuracy of 71.7% that trainer.test
outputs when batchsize=1
, when I recalculated using the output file.
As it is a bug in pytorch-lightning, I'm going to trust the accuracy of my own calculations.
Yes, I believe the correct code should give you 71.7%. Let us know if this is the case.
Actually, there seems to be a easy workaround: https://github.com/PyTorchLightning/pytorch-lightning/issues/6889#issuecomment-830234986
I will take care of adding the workaround to our examples.
This is now fixed in the example scripts. We will update the validation accuracy soon.
The validation accuracy and test accuracy have been updated with the new code: https://github.com/snap-stanford/ogb/blob/master/examples/lsc/mag240m/README.md
In rgnn.py, I reproduced the accuracy described in README.md (70.48%) for the validation data.
On the other hand, I noticed that if I change the hard-coded batch_size in line 433 to speed up the operation, the accuracy changes. For example, when I set batch_size=1024, the accuracy decreased.
On the other hand, if I set batch_size=1, the accuracy increased.
I wondered if the remainder of the data divided into batches was being truncated, but even if 1,000 pieces of data were being truncated, I wondered why the accuracy would change by 1% for over 130,000 validation data.
In addition, I tried to output y_pred using
save_test_submission()
in mag240m.py, but it was almost the same regardless of batchsize.Why does changing the
datamodule.batch_size
affect the accuracy? Is the accuracy calculated by sampling only a portion of the validation data, and does it vary with batchsize?