Results (08/02/2017) - Githubissues

Some early results are available here:

http://data.neuralnoise.com/inferbeddings/logs_08022017.tar.gz

Just decompress the file in the inferbeddings directory.

Those results are generated by jobs on the UCLCS cluster - the scripts generating the jobs have a UCL_ prefix and are available here:

https://github.com/uclmr/inferbeddings/tree/master/scripts/wn18 https://github.com/uclmr/inferbeddings/tree/master/scripts/fb15k

For checking the results - I've done a script that:

Looks for the best hyperameter settings for each metric (filtered setting, like in the ComplEx paper) on the validation set, and
Reports the corresponding results on the test sets.

For example - results on WN18 with and without including rules:

With rules:

$ ./tools/parse_results_filtered.sh logs/ucl_wn18_adv_v1/*.log
1080
Best MR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=1_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=200_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l2.log
Test - Best Filt MR: 140.9154

Best MRR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=10_adv_lr=0.1_adv_weight=10000_batches=10_disc_epochs=10_embedding_size=100_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt MRR: 0.493

Best H@1, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=10_adv_lr=0.1_adv_weight=10000_batches=10_disc_epochs=10_embedding_size=100_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@1: 32.78%

Best H@3, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=10_adv_epochs=10_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@3: 84.57%

Best H@5, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=10_adv_epochs=10_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@5: 90.78%

Best H@10, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=10_adv_epochs=10_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@10: 93.06%

Without rules:

$ ./tools/parse_results_filtered.sh logs/ucl_wn18_adv_v1/*_adv_weight=0_*.log180
Best MR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=0_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=200_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l2.log
Test - Best Filt MR: 146.8016

Best MRR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt MRR: 0.372

Best H@1, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=1_embedding_size=20_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l2.log
Test - Best Filt Hits@1: 16.62%

Best H@3, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@3: 60.31%

Best H@5, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@5: 70.16%

Best H@10, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@10: 79.39%

Please note that the experiments in logs/ucl_fb15k_adv_v?.2 are still running (and most logfiles are incomplete). Those are experiments with a new ruleset I'm trying for FB15k - using clauses with higher support (minimum support here is 1000 instead of 100): this is related to https://github.com/uclmr/inferbeddings/issues/11

uclnlp / inferbeddings

Results (08/02/2017) #13