Closed songsong0425 closed 8 months ago
Hi there!
I would gladly work on that subject :)
I have some questions with regard to the implementation details.
I noticed that the hits@k
metric was implemented as part of the test
method of the KGE base class: torch_geometric.nn.kge.base
.
What would be the best approach to implement ndcg@k
?
metric
=ndcg
/ hits
) to the test function to choose which to compute (hits@k
or ndcg@k
) and compute it inside this test method.ndcg@k
(e.g. test_ndgck
).Tuple[float, float]
-> Tuple[float, float, float, float]
).In its general form, if I understand it correctly, NDCG allows gains from different possible tails for a single head. For example, for a link prediction task, any of the neighbors (tails) of a given node (head) can bring a gain.
Currently, in the hits@k, we allow a single tail per head.
Do we want to let the user give a list of different tails (fixed number or flexible)? (again, a possible change in the signature of the method)
def test(
self,
head_index: Tensor,
rel_type: Tensor,
tail_indexes: List[Tensor],
batch_size: int,
k: int = 10,
log: bool = True,
)
Or do we want to keep the one head - one tail limit?
def test(
self,
head_index: Tensor,
rel_type: Tensor,
tail_index: Tensor,
batch_size: int,
k: int = 10,
log: bool = True,
)
Thanks! :)
I would prefer we implement these as metrics that follow the design of torchmetrics
, such that we can compute them in a mini-batch fashion during training. This would then be something like
def update(top_k_pred_mat, edge_label_index):
pass
Alright, I understand the plan for a torchmetric
design, but I'm a bit lost with regard to the signature.
I understand that we would like a class that looks a bit like this:
class NdcgK():
def __init__(self):
self.dcg = 0
self.idcg = 0
def update(self, top_k_pred_mat: Tensor, edge_label_index: Tensor):
pass
def compute(self):
return self.dcg / self.idcg
And what about top_k_pred_mat
and edge_label_index
? Are they respectively top_k_pred_mat
the matrix of all possible pairs of nodes where only the top k predictions have been kept, and edge_label_index
the valid node pairs / positive edges?
I am a bit confused as to the choice of these inputs. Why not leave the responsibility to the metric to select the top k values? Is it to be able to choose k dynamically?
And then, if it is a mini-batch, do we not want the batch_index
as well to be able to reset the log
factors? I think the torchmetrics
implementation of NDCG does that. Actually, do we want to use that implementation and simply develop the link-prediction wrapper if that is for an example only purpose?
I may have misunderstood something along the way, sorry about that.
edge_label_index
would be a [2, num_labels]
matrix, so it basically comes paired with batch_index
. Let me know if I am missing something
k
should definitely be an attribute during initialization of the metric, and so top_k_pred_mat
would just require size(1) >= k
.
Yes! I think you are right about the batch_index
not being needed.
I have tried to create a first implementation in a new branch so that I can project myself more easily. We can also discuss on that basis for the following updates.
Dear PyG community,
Hi, always thank you for your effort in package maintenance and development. When I tried to implement the link prediction metrics (i.e., hits@k, ndcg@k), I couldn't find the example code for ndcg@k. Can you consider adding the example about the ndcg@k for the evaluation metrics? If it has already existed and I missed it, can anyone tell me where can I find it?
Thank you for reading!
_Originally posted by @songsong0425 in https://github.com/pyg-team/pytorch_geometric/discussions/7792_