Just the validation performance reported?

snap-stanford / GraphGym

Platform for designing and evaluating Graph Neural Networks (GNN)

Other

1.69k stars 185 forks source link

I see that 7.2 Experimental Setup in paper

For all the experiments in Sections 7.3 and 7.4, we use a consistent setup, where results on three random 80%/20% train/val splits are averaged, and the validation performance in the ﬁnal epoch is reported.

In Sections 7.3 and 7.4, the performance used in ranking analysis are all validation performance. First, the validation performance in the ﬁnal epoch meanss it will run out of all epochs, which is final epoch, am I right? Second, I am wondering why we don't use the early-stop and use the test performance mentioned below, which is the best validation epoch test performance.

how to report the performance (e.g., ﬁnal epoch or the best validation epoch) in section 7.1.

In Sections 7.3 and 7.4, the performance used in ranking analysis are all validation performance. First, the validation performance in the ﬁnal epoch meanss it will run out of all epochs, which is final epoch, am I right?

Yes, that is how the experiment was done in our "Design Space for GNN" paper. The main reason is that we wanted to inspect the effect of training epochs as well. If early stopping is use, the training epoch parameter does not make a difference.

Second, I am wondering why we don't use the early-stop and use the test performance mentioned below, which is the best validation epoch test performance.

In practice, we should focus on the epoch where the validation performance is the best. This is conveniently provided in GraphGym in "test_best.csv".

snap-stanford / GraphGym

Just the validation performance reported? #28