pykeen / benchmarking

📊 Results from the reproducibility and benchmarking studies presented in "Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework" (http://arxiv.org/abs/2006.13365)
MIT License
35 stars 4 forks source link

Not able to reproduce numbers for fb15k237 dataset- TuckER model (probably I'm missing something) #28

Closed kaushik333 closed 1 year ago

kaushik333 commented 1 year ago

Hi thanks for the really nice work. I am having some trouble reproducing the numbers in the table for the fb15k237 dataset and the TuckER model (probably Im missing something). Can you please help me out? Details follow:

This is the code I am using and I retrieved the hyperparameters for the run from here

from typing import List
import pykeen.nn
from pykeen.pipeline import pipeline
import torch

result = pipeline(
    dataset="fb15k237",
    dataset_kwargs=dict(
      create_inverse_triples=True
    ),
    model="TuckER",
    model_kwargs=dict(
      embedding_dim=200,
      relation_dim=200,
      dropout_0=0.3,
      dropout_1=0.4,
      dropout_2=0.5,
      apply_batch_normalization=True
    ),
    optimizer="Adam",
    optimizer_kwargs=dict(
      lr=0.0005
    ),
    loss="BCEAfterSigmoid",
    loss_kwargs=dict(
      reduction="mean"
    ),
    training_loop="LCWA",
    training_kwargs=dict(
      num_epochs=100,
      batch_size=1280,
      label_smoothing=0.1,
    ),
    lr_scheduler='ExponentialLR',
    lr_scheduler_kwargs=dict(
        gamma=1.0),
    evaluator_kwargs=dict(
      filtered=True
    ),
    device="cuda:1"
)

result.save_to_directory('fb15k237_tucker')

My losses seems to start from a much higher value than what is reported in github repo's reproducibility folder. Are these the right hyperparameters to reproduce the numbers in the paper ? Trying to reproduce Table 9.

Thanks in advance.

cthoyt commented 1 year ago

Hi @kaushik333 can you please resubmit this issue following the pre-defined template, including full information for reproducibility as requested? thanks.

kaushik333 commented 1 year ago

Closing this issue as batch_size somehow seems to make a huge difference. I used a large batch_size to fit GPU memory. But this doesnt allow me to replicate the accuracies. Fixing it to 128 and also epochs to 500 with all the other configurations same, the result is reproducible.