Seeking for suggestions

nxznm commented 2 years ago

Dear authors, Sorry to bother you. This is a wonderful work and it gives me much inspiration! Recently, I have a question about the details of the paper. I notice a statement in Section 4.4 of your paper that "We retrieve monolingual FASTTEXT embeddings for each language separately, and align those into the same vector space afterwards. We use the sum of word embeddings as the final entity input representation. ". However, recent work [1] has clarified that utilizing (monolingual) name embeddings as side information is tricky for entity alignment. Hence, I try to remove the the initial entity embedding obtained by names in this line of code. Here is my modification:

class MyEmbedding(object):
    def __call__(self, data):
        vocab_size1, vocab_dim1 = data.x1.shape[0], data.x1.shape[2]
        vocab_size2, vocab_dim2 = data.x2.shape[0], data.x2.shape[2]
        data.x1, data.x2 = nn.init.xavier_uniform_(torch.zeros(vocab_size1, vocab_dim1)), nn.init.xavier_uniform_(torch.zeros(vocab_size2, vocab_dim2)) 
        return data

data = DBP15K(path, args.category, transform=MyEmbedding())[0].to(device)

psi_1 = RelCNN(data.x1.size(-1), args.dim, args.num_layers, batch_norm=False,
               cat=True, lin=True, dropout=0.5)
psi_2 = RelCNN(args.rnd_dim, args.rnd_dim, args.num_layers, batch_norm=False,
               cat=True, lin=True, dropout=0.0)
model = DGMC(psi_1, psi_2, num_steps=None, k=args.k).to(device)

optimizer_grouped_parameters = [
    {"params": [data.x1, data.x2], 'lr': 0.001},
    {"params": [p for p in model.parameters()], 'lr': 0.001},
]
optimizer = torch.optim.Adam(optimizer_grouped_parameters, lr=0.001)

We use random initialization in MyEmbedding(). Furthermore, we add the entity embeddings (i.e., data.x1 and data.x2) as parameters into the optimizer. However, the obtained results are really bad(nearly zero in Hits@1 on all datasets, including ZH->EN, JA->EN and FR->EN). I want to know whether the obtained results are reasonable. I also give some guesses on the results. I think the reason is that the proposed model in this paper adopts a two-stage neural architecture. The first stage will obtain an initial ranking of soft correspondences between entities in two knowledge graphs, while the second stage heavily relies on the output of the first stage. If the initial entity embeddings are randomly initialized, the output of the first stage will be meaningless. And the second stage cannot further refine the structural correspondences between knowledge graphs. I am not sure about this. Could you give me any suggestions? Thanks in advance!

[1] A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs

rusty1s commented 2 years ago

Are you sure that data.x1 and data.x2 are trained correctly? In particular, they do not seem to be wrapped as torch.nn.Parameter. Without training random embeddings, your graph matching objective cannot really succeed since there is no real signal in the data to perform the matching. With training the embeddings, you might be able to reach good training performance, but this may not apply for test nodes since their embeddings are not necessarily learned during training.

nxznm commented 2 years ago

Thanks for your valuable suggestions! I further wrap data.x1 and data.x2 with torch.nn.Parameter as follows:

data.x1 = torch.nn.Parameter(data.x1)
data.x2 = torch.nn.Parameter(data.x2)

optimizer_grouped_parameters = [
    {"params": [data.x1, data.x2], 'lr': 0.001},
    {"params": [p for p in model.parameters()], 'lr': 0.001},
]
optimizer = torch.optim.Adam(optimizer_grouped_parameters, lr=0.001)

Now, the value of data.x1 would change if I print it in each epoch of training! However, the test results of Hits@1 are still nearly zero. As you said that "With training the embeddings, you might be able to reach good training performance, but this may not apply for test nodes since their embeddings are not necessarily learned during training.", I also print the Hits@1 results on the train set. Like follows:

torch.no_grad()
def test():
    model.eval()

    _, S_L = model(data.x1, data.edge_index1, None, None, data.x2,
                   data.edge_index2, None, None)

    hits1 = model.acc(S_L, data.test_y)
    hits10 = model.hits_at_k(10, S_L, data.test_y)

    train_hits1 = model.acc(S_L, data.train_y)
    train_hits10 = model.hits_at_k(10, S_L, data.train_y)

    return hits1, hits10, train_hits1, train_hits10

I find the Hits@1 results on the train set would be high as you said! After the first stage, it will be 0.98. After the second stage (refining stage), it will be 0.40. Based on the results above, I think if without utilizing the (monolingual) name embeddings, the generalization of the proposed method is limited?

rusty1s commented 2 years ago

Yes, that is true. There exists some work on bringing learnable embeddings to the inductive case (such as NodePiece), but in general one does need meaningful features for performing graph matching.

nxznm commented 2 years ago

I got it! Thanks for your patience! Wish you have a nice day:)

rusty1s / deep-graph-matching-consensus

Seeking for suggestions #19