pykeen / pykeen

🤖 A Python library for learning and evaluating knowledge graph embeddings
https://pykeen.readthedocs.io/en/stable/
MIT License
1.69k stars 189 forks source link

Recommend best reproducing settings and parameters to reproduce ComplEx+N3 #1155

Open sylviawangfr opened 2 years ago

sylviawangfr commented 2 years ago

Problem Statement

I tried to reproduce the best ComplEx + N3 scores with dataset FB15K237. The settings are as below: pipeline_result = pipeline( dataset=ds.FB15k237(), model=ComplEx, model_kwargs=dict(embedding_dim=512, entity_initializer="xavier_uniform", relation_initializer="xavier_uniform"), loss=CrossEntropyLoss, loss_kwargs={"reduction": "mean"}, regularizer=LpRegularizer, regularizer_kwargs=dict(weight=5e-2, p=3.0, ), optimizer="adagrad", optimizer_kwargs=dict(lr=0.5), evaluator=RankBasedEvaluator, training_loop="SLCWA", negative_sampler="basic", negative_sampler_kwargs={"num_negs_per_pos": 10}, training_kwargs={ "num_epochs": 1000, "batch_size": 1024 }, The scores are much less than the reported scores in papers. I tried the parameters from this paper "You can teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings" and this repo https://github.com/facebookresearch/kbc, the scores are even lower. I feel it is pretty hard to find the best parameters. Is it Ok that Pykeen can provide settings/parameters for FB15K237 to reproduce scores? Especially for ComplEx and Canonical Tensor Decomposition.

Describe the solution you'd like

Is it Ok that Pykeen can provide settings/parameters or checkpoints for datasets to reproduce scores in papers?

Describe alternatives you've considered

Is it Ok that Pykeen can provide settings/parameters or checkpoints for FB15K237 to reproduce scores in paper? Especially for ComplEx and Canonical Tensor Decomposition.

Additional information

No response

Issue Template Checks

mberr commented 1 year ago

Hi,

reproducing KGE papers is unfortunately sometimes not easy. We curate some experiment configurations at src/pykeen/experiments, which you can run from console as described in https://github.com/pykeen/pykeen#reproduction

In addition, we make all configurations from our reproducibility and benchmarking study available at https://github.com/pykeen/benchmarking. Please note that these experiments were run with an older version of PyKEEN, and you may need to adjust some parameter names to account for changes made since then.