[Dataset] Clarification on Dataset processing

https://github.com/mlcommons/training/blob/cdd928d4596c142c15a7d86b2eeadbac718c8da2/graph_neural_network/dataset.py#L137-L139

Let me use an example.

Assume we have edge file like this:

[0, 1, 2]  # cites_edge[0, :]
[1, 2, 3]  # cites_edge[1, :]

Then, we first do

add_self_loops(remove_self_loops(paper_paper_edges)[0])[0]

, which gives us this:

[0, 1, 2, 0, 1, 2, 3]  # cites_edge[0, :]
[1, 2, 3, 0, 1, 2, 3]  # cites_edge[1, :]

Then, we have its reverse edge:

[1, 2, 3, 0, 1, 2, 3]  # cites_edge[1, :]
[0, 1, 2, 0, 1, 2, 3]  # cites_edge[0, :]

If we follow this code

(torch.cat([cites_edge[1, :], cites_edge[0, :]]), torch.cat([cites_edge[0, :], cites_edge[1, :]])

, we should have this:

[1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3]  # cites_edge[1, :] + cites_edge[0, :]
[0, 1, 2, 0, 1, 2, 3, 1, 2, 3, 0, 1, 2, 3]  # cites_edge[0, :] + cites_edge[1, :]

Instead of this below since we must exactly follow the MLPerf, we cannot have the other way around like this (torch.cat([cites_edge[0, :], cites_edge[1, :]]), torch.cat([cites_edge[1, :], cites_edge[0, :]])

[1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3]  # cites_edge[0, :] + cites_edge[1, :]
[0, 1, 2, 0, 1, 2, 3, 1, 2, 3, 0, 1, 2, 3]  # cites_edge[1, :] + cites_edge[0, :]

Am I understanding this correctly? Does the order matters here? Thank you!

mlcommons / training

[Dataset] Clarification on Dataset processing #767