uclnlp / inferbeddings

Injecting Background Knowledge in Neural Models via Adversarial Set Regularisation
MIT License
59 stars 12 forks source link

Decide the best ruleset for FB15k #11

Closed pminervini closed 7 years ago

pminervini commented 7 years ago

For generating candidate rule-sets try e.g.

$ ./tools/amie-to-clauses.py -t 0.9 data/fb15k/rules/fb15k-rules_mins=1000_minis=1000.txt
pminervini commented 7 years ago

All the rule sets are available here:

https://github.com/uclmr/inferbeddings/tree/master/data/fb15k/clauses

tdmeeste commented 7 years ago

Currently, @pminervini is running experiments on 4 different rule sets:

Motivation for these rule sets:

@pminervini any idea what the relation /dataworld/gardening_hint/split_to means? It appears in many rules, including high-confidence rules.

We can allow for more rules (if adding confident rules with low support turns out to help), but when we decide on a fixed rule set, maybe we need to filter redundant rules as discussed.

Curious to see what it'll bring, let's discuss in this issue.

pminervini commented 7 years ago

@tdmeeste told me to execute the following experiments: https://github.com/uclmr/inferbeddings/blob/master/scripts/fb15k/UCL_FB15K_clauses_v1.py

so far the best results have been obtained either with clauses_highconf_highsupp.pl or with clauses_lowconf_highsupp.pl

$ ./tools/parse_results_filtered.sh logs/ucl_fb15k_clauses_v1/*.log
144
Best MR, Filt: logs/ucl_fb15k_clauses_v1/ucl_fb15k_clauses_v1.adv_batch_size=10_adv_epochs=10_adv_lr=0.1_adv_weight=1_batches=10_clausefile=clauses_lowconf_highsupp.pl_disc_epochs=1_embedding_size=100_epochs=100_lr=0.1_margin=1_model=DistMult_optimizer=adagrad_similarity=dot.log
Test - Best Filt MR: 87.76886

Best MRR, Filt: logs/ucl_fb15k_clauses_v1/ucl_fb15k_clauses_v1.adv_batch_size=10_adv_epochs=0_adv_lr=0.1_adv_weight=1_batches=10_clausefile=clauses_lowconf_highsupp.pl_disc_epochs=10_embedding_size=100_epochs=100_lr=0.1_margin=1_model=ComplEx_optimizer=adagrad_similarity=dot.log
Test - Best Filt MRR: 0.519

Best H@1, Filt: logs/ucl_fb15k_clauses_v1/ucl_fb15k_clauses_v1.adv_batch_size=10_adv_epochs=0_adv_lr=0.1_adv_weight=1_batches=10_clausefile=clauses_highconf_highsupp.pl_disc_epochs=10_embedding_size=100_epochs=100_lr=0.1_margin=1_model=ComplEx_optimizer=adagrad_similarity=dot.log
Test - Best Filt Hits@1: 38.591%

Best H@3, Filt: logs/ucl_fb15k_clauses_v1/ucl_fb15k_clauses_v1.adv_batch_size=10_adv_epochs=0_adv_lr=0.1_adv_weight=1_batches=10_clausefile=clauses_lowconf_highsupp.pl_disc_epochs=10_embedding_size=100_epochs=100_lr=0.1_margin=1_model=ComplEx_optimizer=adagrad_similarity=dot.log
Test - Best Filt Hits@3: 60.408%

Best H@5, Filt: logs/ucl_fb15k_clauses_v1/ucl_fb15k_clauses_v1.adv_batch_size=10_adv_epochs=0_adv_lr=0.1_adv_weight=1_batches=10_clausefile=clauses_lowconf_highsupp.pl_disc_epochs=10_embedding_size=100_epochs=100_lr=0.1_margin=1_model=ComplEx_optimizer=adagrad_similarity=dot.log
Test - Best Filt Hits@5: 68.102%

Best H@10, Filt: logs/ucl_fb15k_clauses_v1/ucl_fb15k_clauses_v1.adv_batch_size=10_adv_epochs=0_adv_lr=0.1_adv_weight=1_batches=10_clausefile=clauses_lowconf_highsupp.pl_disc_epochs=10_embedding_size=100_epochs=100_lr=0.1_margin=1_model=ComplEx_optimizer=adagrad_similarity=dot.log
Test - Best Filt Hits@10: 76.349%

Log files for the experiments are available at http://data.neuralnoise.com/inferbeddings/ucl_fb15k_clauses_v1.tar.gz

@pminervini any idea what the relation /dataworld/gardening_hint/split_to means? It appears in many rules, including high-confidence rules.

@tdmeeste I have really no idea

riedelcastro commented 7 years ago

What are the numbers without rules?

pminervini commented 7 years ago

What are the numbers without rules?

adv_weight=0 was not in UCL_FB15K_clauses_v1.py - adding it right now ..

riedelcastro commented 7 years ago

Isn't it better to remove --adv-lr to get rid of the adversarial properly (or is this now happening with --adv-weight 0?)

pminervini commented 7 years ago

Isn't it better to remove --adv-lr to get rid of the adversarial properly (or is this now happening with --adv-weight 0?)

Yes - it happens with --adv-weight 0

tdmeeste commented 7 years ago

@pminervini I created several high-support datasets, based on a new high-recall amie+ run on fb15k, for various confidences, only based on fb15k training data. If the latest results are promising (still have to check), I can try to further improve the clause files by manually filtering.

pminervini commented 7 years ago

I think we agreed to use Guo et al.'s FB122: https://github.com/uclmr/inferbeddings/tree/master/data/guo-emnlp16/fb122