Open BowenYao18 opened 1 month ago
In GNN training, we only care if the edge counts (graph topology) are the same, which means that you should have the exact number of edges (a->b for any combination of a, b in the reference implementation graph) as the reference implementation.
However, the order does not matter here - they will be rearranged in the COO -> CSC conversion process anyways, so you can use cites_edge[0, :] + cites_edge[1, :]
for the source and cites_edge[1, :] + cites_edge[0, :]
for the destination as well.
https://github.com/mlcommons/training/blob/cdd928d4596c142c15a7d86b2eeadbac718c8da2/graph_neural_network/dataset.py#L137-L139
Let me use an example.
, which gives us this:
, we should have this:
Instead of this below since we must exactly follow the MLPerf, we cannot have the other way around like this (torch.cat([cites_edge[0, :], cites_edge[1, :]]), torch.cat([cites_edge[1, :], cites_edge[0, :]])
Am I understanding this correctly? Does the order matters here? Thank you!