Open fratajcz opened 9 months ago
I agree we can do a better job at documenting RandomLinkSplit
. Please note that only the resulting edge_label_index
is guaranteed to be disjoint. edge_index
(i.e., the edges used to perform message passing) are mostly shared (with the small exception that we are also using validation message passing edges during testing).
Ah, I think I'm beginning to see how the output is meant to be used, thanks for clarifying. It makes sense, just not in the way i expected.
📚 Describe the documentation issue
Hi!
I think the RandomLinkSplit class is crucial for reproducible results in link prediction use cases and so far can not be easily substitute by any splitting class that I have come across. However, the behaviour on undirected graphs is so unintuitive that it had me question my sanity for a little while. Coming from sklearn's
train_test_split
class I was expecting that the three objects returned by RandomLinkSplit are disjoint subsets of the initialedge_index
. However, to arrive at this result, we have to perform some additional steps:Example:
Output:
Expected behaviour:
train_data.edge_index
,val_data.edge_index
andtest_data.edge_index
are disjoint subsets ofdata.edge_index
, as is shown in the last three print statements.Observed behaviour:
edge_index
of the three return data objects are almost identical and I can't understand why that would be desired. Several steps are necessary to achieve the expected behaviour which requires good knowledge of the used class.Suggest a potential alternative/fix
State in the documentation that the splitting does not happen on the returned
edge_index
tensors.