Unable to reproduce Table 9 of the paper for the algorithms marked in Orange -- row R

kaushik333 commented 1 year ago

Describe the bug

I am trying to reproduce Table 9 in the paper (https://arxiv.org/pdf/2006.13365.pdf). I am making using of hyperparameters json file in the folder (https://github.com/pykeen/pykeen/tree/master/src/pykeen/experiments). For example, for conve algorithm and fb15k237 dataset, I use (https://github.com/pykeen/pykeen/blob/master/src/pykeen/experiments/conve/dettmers2018_conve_fb15k237.json).

Using these hyperparameters, I am able to perfectly reproduce ConvE and TuckER algorithms, but I am not able to reproduce the other algorithms. When I mention "reproduce" I mean the row R in this table (not pub -- because the paper clearly says for algos in orange, pub and R do not match).

I might be missing something here, can you please help me out?

The numbers I am getting are as follows:	Dataset_algo	MRR	Hits@1	Hits@3	Hits@5	Hits@10	MR
fb15k237_conve	29.61	21.14	32.16	38.60	47.08	241.59	3.39
fb15k237_convkb	3.71	2.49	3.63	4.07	4.42	3513.93	49.24
fb15k237_mure	23.88	15.22	25.83	32.70	42.41	236.28	3.31
fb15k237_quate	-0.02	0.00	0.01	0.01	0.03	7109.19	99.63
fb15k237_rotate	33.86	24.30	37.39	44.27	53.33	171.75	2.41
fb15k237_tucker	35.53	26.24	39.10	45.83	54.29	150.12	2.10

How to reproduce

I am actually using raytune to launch experiments parallely:


from typing import List
import pykeen.nn
from pykeen.pipeline import pipeline, replicate_pipeline_from_path
import torch
import ray
from ray import tune
import os
import json

dataset = ["fb15k237"]
algorithms = ["conve", "convkb", "mure", "quate", "rotate", "tucker"]

def experiment(config):
    json_file = os.path.join("<path_to_config_files>", config["file_name"])
    replicate_pipeline_from_path(json_file, replicates=1, directory=os.path.join("./results", config["file_name"].split(".")[0]))

cpus = 4
gpus = 4
num_parallel = 4

ray.init(num_cpus=cpus,
    num_gpus=gpus,
    include_dashboard=False,
    _temp_dir=os.path.expanduser('~/tmp'))

json_list = []
for d in dataset:
    for algo in algorithms:
        json_list.append(f"{d}_{algo}.json")

tune.run(experiment, config={
        "file_name": tune.grid_search(json_list),
        },
        num_samples=1,
        log_to_file=True,
        local_dir="./results",
        resources_per_trial={'cpu': cpus // num_parallel,
                            'gpu': gpus / num_parallel - 0.01})

Environment

Key	Value
OS	posix
Platform	Linux
Release	5.4.0-131-generic
Time	Wed Feb 1 08:57:18 2023
Python	3.8.16
PyKEEN	1.9.0
PyKEEN Hash	UNHASHED
PyKEEN Branch
PyTorch	1.13.1+cu117
CUDA Available?	true
CUDA Version	11.7
cuDNN Version	8500

Additional information

Code to replicate my environment:

conda create -n graph python=3.8
conda activate graph
pip install pykeen[transformers] --force-reinstall
pip install ipykernel
conda install -n graph ipykernel --update-deps --force-reinstall
pip install ray

Issue Template Checks

[X] This is not a feature request (use a different issue template if it is)
[X] This is not a question (use the discussions forum instead)
[X] I've read the text explaining why including environment information is important and understand if I omit this information that my issue will be dismissed

kaushik333 commented 1 year ago

@cthoyt @mberr could you please help me out?

cthoyt commented 1 year ago

@kaushik333 I would suggest you start by carefully searching through existing issues and discussions on this tracker - there are other people who have had similar issues we've been able to address.

For future information, we maintain this project in our free time and don't have time to address all issues. Demonstrating you've put in significant effort ahead of time is a good way to get us interested, probably not badgering us.

That being said, I'll suggest you use the same version of PyKEEN that accompanies the paper if you want to make a faithful reproduction. Many improvements have been made in the mean time that affect results.

pykeen / pykeen