For checking the results - I've done a script that:
Looks for the best hyperameter settings for each metric (filtered setting, like in the ComplEx paper) on the validation set, and
Reports the corresponding results on the test sets.
For example - results on WN18 with and without including rules:
With rules:
$ ./tools/parse_results_filtered.sh logs/ucl_wn18_adv_v1/*.log
1080
Best MR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=1_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=200_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l2.log
Test - Best Filt MR: 140.9154
Best MRR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=10_adv_lr=0.1_adv_weight=10000_batches=10_disc_epochs=10_embedding_size=100_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt MRR: 0.493
Best H@1, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=10_adv_lr=0.1_adv_weight=10000_batches=10_disc_epochs=10_embedding_size=100_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@1: 32.78%
Best H@3, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=10_adv_epochs=10_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@3: 84.57%
Best H@5, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=10_adv_epochs=10_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@5: 90.78%
Best H@10, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=10_adv_epochs=10_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@10: 93.06%
Without rules:
$ ./tools/parse_results_filtered.sh logs/ucl_wn18_adv_v1/*_adv_weight=0_*.log180
Best MR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=0_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=200_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l2.log
Test - Best Filt MR: 146.8016
Best MRR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt MRR: 0.372
Best H@1, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=1_embedding_size=20_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l2.log
Test - Best Filt Hits@1: 16.62%
Best H@3, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@3: 60.31%
Best H@5, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@5: 70.16%
Best H@10, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@10: 79.39%
Please note that the experiments in logs/ucl_fb15k_adv_v?.2 are still running (and most logfiles are incomplete). Those are experiments with a new ruleset I'm trying for FB15k - using clauses with higher support (minimum support here is 1000 instead of 100): this is related to https://github.com/uclmr/inferbeddings/issues/11
Some early results are available here:
http://data.neuralnoise.com/inferbeddings/logs_08022017.tar.gz
Just decompress the file in the
inferbeddings
directory.Those results are generated by jobs on the UCLCS cluster - the scripts generating the jobs have a
UCL_
prefix and are available here:https://github.com/uclmr/inferbeddings/tree/master/scripts/wn18 https://github.com/uclmr/inferbeddings/tree/master/scripts/fb15k
For checking the results - I've done a script that:
For example - results on WN18 with and without including rules:
Without rules:
Please note that the experiments in
logs/ucl_fb15k_adv_v?.2
are still running (and most logfiles are incomplete). Those are experiments with a new ruleset I'm trying for FB15k - using clauses with higher support (minimum support here is 1000 instead of 100): this is related to https://github.com/uclmr/inferbeddings/issues/11