Closed zjost closed 2 years ago
Great question. The rule to use val/test labels (edges) are clearly stated here. In short, you should not use val/test edges as message passing nor computing your loss.
Thank you for the response. As I understand it, you're saying the "Type A" leaks described above are avoided by excluding val/test edges from ALL message passing operations. And I suppose whether or not you include the training edges for message passing is a choice of the modeler and if it's a problem, the leaderboard will reflect that. Is that a fair assessment?
I'll close the issue to tidy up, but please feel free to correct anything I've misstated for posterity.
Yes your understanding is correct
Just want to follow up on this question.
The example code uses the entire edge sets, including all training,validation and test, for the message passing purpose. adj_t
here. but in leaderboard comparison, we shouldn't use it, because it leaks the edge information.
I checked the ogbl-ddi's leaderboard and the code for PLNLP and SEAL methods. it seems they both use adj_t
for message passing. But it is not acceptable in this task setting, right?
adj_t
only contains the training edges. Message passing should never use val/test edges.
Hello team. Thanks for the wonderful work and project. I am wondering how exactly we are expected to handle edge masking in Link Prediction tasks. I've seen a number of issues (e.g., #213, #27) that have similar confusion. As I understand it, there are two separate types of "information leak" that could exist:
My questions are as follows as it relates to expectations and implementations used for the leaderboard:
adj_t
here)exclude
argument in the EdgeLoaderUltimately, I want to make sure I'm making a fair comparison to the leaderboard, but I also wonder about the "right" way to measure performance on this task, regardless of decisions made for the leaderboard.
As a final request, perhaps the website can document these choices and expectations somewhere? Thanks!