Some help reproducing results of FB20k & DBPedia50k

karthik63 commented 4 years ago

Hi, I'm able to reproduce the FB15k-237-OWE results reported in the paper. I'm unable to reproduce FB20k and DBpedia50k results.

Can you tell me which split you used? I'm using the splits of DKRL and Conmask respectively. For FB20k do you use the same descriptions as DKRL or shorter wikidata descriptions ?
Do I need to change any of the hyperparamters ? In the paper, only dropout is mentioned
The descriptions of FB20k and DBpedia50k are quite long. Do you still just use an average encoder? Any filtering before averaging ?

haseebs commented 4 years ago

Hi, I'm able to reproduce the FB15k-237-OWE results reported in the paper. I'm unable to reproduce FB20k and DBpedia50k results.
1. Can you tell me which split you used? I'm using the splits of DKRL and Conmask respectively. For FB20k do you use the same descriptions as DKRL or shorter wikidata descriptions ?

We use the same splits (make sure you are using the correct ones). For FB20k, we use the same description (longer one). I have added the preprocessed version of FB20k dataset to the readme as well.

2. Do I need to change any of the hyperparamters ? In the paper, only dropout is mentioned

Yes you need to change the hyperparameters. Unfortunately, I do not have the SOTA hyperparameters anymore. But for FB20k, give something like this a try:

BatchSize = 256
EmbeddingDimensionality = 300
LearningRate = 0.001
LearningRateSchedule = 8,20,300
LearningRateGammas = 0.1,0.1,0.1
InitializeEmbeddingWithAllEntities = False
TransformationType = Affine
EncoderType = Average
Loss = Pairwise
UNKType = Zero
AverageWordDropout = 0
IterTriplets = True

3. The descriptions of FB20k and DBpedia50k are quite long. Do you still just use an average encoder? Any filtering before averaging ?

Yes, the average encoder seems to work the best. Check out the Entity class in data.py to learn more about the filtering that is done. It is the same for all datasets.

karthik63 commented 4 years ago

Thanks man. I'll let you know in a week if i had any luck

karthik63 commented 4 years ago

I'm still not able to reproduce the results with the same FB20k split and the configurations as above. These are my results

Epoch: 101 38.70 Hits@1 (%) 50.99 Hits@3 (%) 62.30 Hits@10 (%) 46.97 MRR (filtered) (%) 44.93 MRR (raw) (%) Mean rank: 68 Mean rank raw: 69

haseebs commented 4 years ago

I'm still not able to reproduce the results with the same FB20k split and the configurations as above. These are my results

Epoch: 101 38.70 Hits@1 (%) 50.99 Hits@3 (%) 62.30 Hits@10 (%) 46.97 MRR (filtered) (%) 44.93 MRR (raw) (%) Mean rank: 68 Mean rank raw: 69

I am away right now. I will try to figure out the problem and send you the hyperparameters that reproduce the paper result in a few days.

karthik63 commented 4 years ago

Thank you

villmow commented 4 years ago

Hi @karthik63
could you try the following config for FB20k:

EDIT: see post below

haseebs commented 4 years ago

Sorry for the delay. You should ignore the previous one, it belongs to an old version of the code. Try the below one (Try it with both Loss=Cosine and Loss=Pairwise):

[GPU]
DisableCuda = False
GPUs = 0

[Training]
; Which KGC model to use: ComplEx, TransE, TransR, DistMult
LinkPredictionModelType = ComplEx
Epochs = 100
BatchSize = 128
; Dimensionality of Embedding file is used, if one is given
EmbeddingDimensionality = 300
LearningRate = 0.001
#LearningRate = 2e-5
LearningRateSchedule = 40,80,120
LearningRateGammas = 0.5,0.5,0.5
InitializeEmbeddingWithAllEntities = False
; Whether we want to initialize with embeddings obtained from OpenKE
; These are read from the embedding subdir
InitializeWithPretrainedKGCEmbedding = True
; Type of OWE transformation to use: Linear, Affine, FCN
TransformationType = Affine
; Type of OWE encoder to use: Average, LSTM
EncoderType = Average
; Whether we use only heads or heads+tails during optimization (tail prediction)
UseTailsToOptimize = False
; Which loss to use: Pairwise (Euclidean) or Cosine
Loss = Pairwise
; What to use as an UNK token: Zero, Average, TODO
UNKType = Zero
; How much word dropout to use
AverageWordDropout = 0
; What should be iterated during training: triplets or entities
IterTriplets = False
EqualPaddingAcrossAllBatches = True
MaxSequenceLength = 512
Optimizer = Adam
;GradientAccumulationSteps = 1
;SchedulerType = WarmupLinearScheduler
;SchedulerWarmupSteps = 4000

[FCN]
FCNUseSigmoid = False
FCNLayers = 1
FCNDropout = 0.5
FCNHiddenDim = 300

[LSTM]
LSTMOutputDim = 300
LSTMBidirectional = False

[Evaluation]
; Note: most of the options below will slow down the training if true
ValidateEvery = 1
; Whether we should use Target Filtering (from ConMask)
UseTargetFilteringShi = True
; Prints nearest neighbour entities to the test entities
PrintTestNN = False
; Prints nearest neighbour entities to the training entities
PrintTrainNN = False
; Baseline where evaluation is done by randomly corrupting heads
EvalRandomHeads = False
; Calculate mean nearest neighbour rank
CalculateNNMeanRank = False
; Target filtering baseline from ConMask
ShiTargetFilteringBaseline = False
; Generate embeddings to view using tensorboard projector
GetTensorboardEmbeddings = False

[EarlyStopping]
EarlyStopping = True
EarlyStoppingThreshold = 0.001
EarlyStoppingLastX = 10
EarlyStoppingMinEpochs = 30

[Entity2Text]
; Path to the pretrained word embeddings
PretrainedEmbeddingFile = /data/dok/johannes/pretrained_embeddings/wikipedia2vec/enwiki_20180420_300d.bin 
; Whether we should read the entity data from the entitiy2wikidata file
ConvertEntities = True
ConvertEntitiesWithMultiprocessing = True
; Tries to convert entity to a single token and match that token in embedding.
; Uses wikipedia link suffix as token.
; Fallback is to avg all lemmas.
MatchTokenInEmbedding = False
; Tries to convert entity into single token and match that token in embedding.
; This one uses label of the entity where spaces are replaced by underscores.
MatchLabelInEmbedding = False

[Dataset]
TrainFile = train.txt
ValidationFile = valid_zero.txt
TestFile = test_zero.txt
SkipHeader = False
; TAB or SPACE
SplitSymbol = TAB

Let us know whether it works.

karthik63 commented 4 years ago

Thank you. Can you give me a particular grid search to run on any new dataset so I can report the results ? How much tuning of the learning rate schedule is needed ? I'll let you know If these parameters work in a few days.

haseebs commented 4 years ago

Thank you. Can you give me a particular grid search to run on any new dataset so I can report the results ? How much tuning of the learning rate schedule is needed ? I'll let you know If these parameters work in a few days.

You may want to try changing Loss (2 values), BatchSize (128, 256, 512, 1024), UNKType (2), AverageWordDropout (0 and 10% maybe?), UseTailsToOptimize (2) and IterTriplets (2). For learning rate, try a few values between 0.01 and 0.001.

If you are short on time, you could leave UNKType, UseTailsToOptimize and IterTriplets to their default values as set in the above parameters. They seemed to work well for us, but its hard to say that they will be best for every dataset out there.

During your experiments, you may notice that after a certain epoch, the validation performance stops improving for all the next epochs. You should set the scheduler to reduce the learning rate after this epoch.

haseebs / OWE

Some help reproducing results of FB20k & DBPedia50k #3