Closed AdFiFi closed 2 weeks ago
I believe it is common practice to set a range of matchable entities using a test set before computing similarity, which, in my view, relies on a 1-to-1 mapping assumption. This evaluation setting is widely used in other repositories as well. When running the evaluation code, the embeddings are typically sorted and filtered based on the test pairs before proceeding with the evaluation.
Using ground-truth counterparts as candidates?
Yes. In almost all papers, only the test pairs are considered when calculating the embeddings. Our paper introduces small blocks to allow for scalability, and this filtering process is implemented within these small blocks. This is equivalent to filtering globally during the evaluation.
You can find similar implementations in OpenEA and DualAMN. I believe this approach adheres to the assumption of 1-to-1 mapping.
If you are interested in exploring beyond the 1-to-1 mapping assumption, you may want to look into the paper on knowledge graph alignment with dangling cases.
Thank you so much for your interest in our work. We are open to questions at any time.
But why aren't global_matrix and global_matrix_t in main.py square matrices? The size is the number of nodes in the source graph and the target graph, right?
Yes. I recall that when I implemented that, I used a sparse matrix so that the similarity matrix would not include the filtered entries. This allowed for filtered evaluation even though the matrix size remains the full size. This was the most convenient way to implement it since we need sparse matrices to store the similarity between a large number of items anyway, and the matrix size is just metadata, not reflecting the actual size of the data.
You could help me check whether this implementation is correct. If not, by fixing it, you would probably achieve a better score than mine.
Thank you for your answering and sharing, which help me understand this work better.
I noticed this because I found the batch_sim is a square matrix. But the batch_sim should not be a square matrix since you can't exclude nodes that don't counterparts without some other knowledge. But in line 101-119 (the get_eval_ids() method), you directly select a matching set of nodes from the test set? Isn't the knowledge in the test set unknown?