pykeen / pykeen

🤖 A Python library for learning and evaluating knowledge graph embeddings
https://pykeen.readthedocs.io/en/stable/
MIT License
1.66k stars 187 forks source link

Unable to reproduce Table 9 of the paper for the algorithms marked in Orange -- row R #1219

Open kaushik333 opened 1 year ago

kaushik333 commented 1 year ago

Describe the bug

I am trying to reproduce Table 9 in the paper (https://arxiv.org/pdf/2006.13365.pdf). I am making using of hyperparameters json file in the folder (https://github.com/pykeen/pykeen/tree/master/src/pykeen/experiments). For example, for conve algorithm and fb15k237 dataset, I use (https://github.com/pykeen/pykeen/blob/master/src/pykeen/experiments/conve/dettmers2018_conve_fb15k237.json).

Using these hyperparameters, I am able to perfectly reproduce ConvE and TuckER algorithms, but I am not able to reproduce the other algorithms. When I mention "reproduce" I mean the row R in this table (not pub -- because the paper clearly says for algos in orange, pub and R do not match).

I might be missing something here, can you please help me out?

The numbers I am getting are as follows: Dataset_algo MRR Hits@1 Hits@3 Hits@5 Hits@10 MR AMR
fb15k237_conve 29.61 21.14 32.16 38.60 47.08 241.59 3.39
fb15k237_convkb 3.71 2.49 3.63 4.07 4.42 3513.93 49.24
fb15k237_mure 23.88 15.22 25.83 32.70 42.41 236.28 3.31
fb15k237_quate -0.02 0.00 0.01 0.01 0.03 7109.19 99.63
fb15k237_rotate 33.86 24.30 37.39 44.27 53.33 171.75 2.41
fb15k237_tucker 35.53 26.24 39.10 45.83 54.29 150.12 2.10

How to reproduce

I am actually using raytune to launch experiments parallely:


from typing import List
import pykeen.nn
from pykeen.pipeline import pipeline, replicate_pipeline_from_path
import torch
import ray
from ray import tune
import os
import json

dataset = ["fb15k237"]
algorithms = ["conve", "convkb", "mure", "quate", "rotate", "tucker"]

def experiment(config):
    json_file = os.path.join("<path_to_config_files>", config["file_name"])
    replicate_pipeline_from_path(json_file, replicates=1, directory=os.path.join("./results", config["file_name"].split(".")[0]))

cpus = 4
gpus = 4
num_parallel = 4

ray.init(num_cpus=cpus,
    num_gpus=gpus,
    include_dashboard=False,
    _temp_dir=os.path.expanduser('~/tmp'))

json_list = []
for d in dataset:
    for algo in algorithms:
        json_list.append(f"{d}_{algo}.json")

tune.run(experiment, config={
        "file_name": tune.grid_search(json_list),
        },
        num_samples=1,
        log_to_file=True,
        local_dir="./results",
        resources_per_trial={'cpu': cpus // num_parallel,
                            'gpu': gpus / num_parallel - 0.01})

Environment

Key Value
OS posix
Platform Linux
Release 5.4.0-131-generic
Time Wed Feb 1 08:57:18 2023
Python 3.8.16
PyKEEN 1.9.0
PyKEEN Hash UNHASHED
PyKEEN Branch
PyTorch 1.13.1+cu117
CUDA Available? true
CUDA Version 11.7
cuDNN Version 8500

Additional information

Code to replicate my environment:

conda create -n graph python=3.8
conda activate graph
pip install pykeen[transformers] --force-reinstall
pip install ipykernel
conda install -n graph ipykernel --update-deps --force-reinstall
pip install ray

Issue Template Checks

kaushik333 commented 1 year ago

@cthoyt @mberr could you please help me out?

cthoyt commented 1 year ago

@kaushik333 I would suggest you start by carefully searching through existing issues and discussions on this tracker - there are other people who have had similar issues we've been able to address.

For future information, we maintain this project in our free time and don't have time to address all issues. Demonstrating you've put in significant effort ahead of time is a good way to get us interested, probably not badgering us.

That being said, I'll suggest you use the same version of PyKEEN that accompanies the paper if you want to make a faithful reproduction. Many improvements have been made in the mean time that affect results.