snap-stanford / relbench

RelBench: Relational Deep Learning Benchmark
https://relbench.stanford.edu
MIT License
220 stars 41 forks source link

Differences between `idgnn_link.py` and `gnn_link.py` #262

Closed quang-truong closed 1 month ago

quang-truong commented 2 months ago

Hi,

I have tried to differentiate between the examples idgnn_link.py and gnn_link.py. There are few notables differences:

If I understand correctly, idgnn_link.py samples nodes, and the model learns the embeddings of nodes in the destination tables. Meanwhile, for gnn_link.py, there are sampled positive and negative destination nodes. In this file, the model minimizes the scores, which is calculated by element-wise product of source and destination embeddings.

Please correct me if I understand it wrongly. My main questions are:

  1. Which setting is officially adopted for Link Prediction in the RelBench paper?
  2. What is the intuition behind share_same_time argument for LinkNeighborLoader? And why does it affect how negative scores are calculated? Below is a snippet from gnn_link.py.
    pos_score = torch.sum(x_src * x_pos_dst, dim=1)
    if args.share_same_time:
    # [batch_size, batch_size]
    neg_score = x_src @ x_neg_dst.t()
    # [batch_size, 1]
    pos_score = pos_score.view(-1, 1)
    else:
    # [batch_size, ]
    neg_score = torch.sum(x_src * x_neg_dst, dim=1)
  3. I could not find details about samplers for each dataset in the RelBench paper. I realized there are other arguments that are not covered in the paper, such as temporal_strategy, subgraph_type, etc.