uclnlp / inferbeddings

Injecting Background Knowledge in Neural Models via Adversarial Set Regularisation
MIT License
59 stars 12 forks source link

Experiments & Datasets #9

Closed pminervini closed 7 years ago

pminervini commented 7 years ago

Note - Experiments on FB15k can be a bit slow (take several hours); e.g. try this:

$ python3 ./bin/adv-cli.py --train data/fb15k/freebase_mtr100_mte100-train.txt --valid data/fb15k/freebase_mtr100_mte100-valid.txt --test data/fb15k/freebase_mtr100_mte100-test.txt --clauses data/fb15k/clauses/clauses_0.999.pl --nb-epochs 100 --lr 0.1 --nb-batches 10 --model TransE --similarity l2 --margin 1 --embedding-size 150 --adv-lr 0.1 --adv-init-ground --adversary-epochs 0 --discriminator-epochs 10 --adv-weight 1000 --adv-batch-size 1

Consider using other datasets, e.g. YAGO or DBpedia.

riedelcastro commented 7 years ago

How about using our NYT data. Evaluation is more tedious, but it’s smaller and can be used to compare against NAACL and EMNLP...

On 6 Feb 2017, at 13:11, Pasquale Minervini notifications@github.com wrote:

E.g. try this:

$ python3 ./bin/adv-cli.py --train data/fb15k/freebase_mtr100_mte100-train.txt --valid data/fb15k/freebase_mtr100_mte100-valid.txt --test data/fb15k/freebase_mtr100_mte100-test.txt --clauses data/fb15k/clauses/clauses_0.999.pl --nb-epochs 100 --lr 0.1 --nb-batches 10 --model TransE --similarity l2 --margin 1 --embedding-size 150 --adv-lr 0.1 --adv-init-ground --adversary-epochs 0 --discriminator-epochs 10 --adv-weight 1000 --adv-batch-size 1 Consider using other datasets, e.g. YAGO or DBpedia.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uclmr/inferbeddings/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKMzOO81db1lUigHpLkYWRXNThHczUJks5rZxwZgaJpZM4L4Krs.

pminervini commented 7 years ago

@riedelcastro are you referring to https://www.dropbox.com/s/5iulumlihydo1k7/naacl2013.txt.zip?dl=1 ?

With @rockt and @tdmeeste we discussed a bit about this - IIRC a concern was the lack of a validation set, but I think the best hyperparams can be tuned by using cross-validation on the training set.

riedelcastro commented 7 years ago

Yes, that's the one. And yes, as said it's a bit harder to evaluate and tune for. On the other hand, it's smaller, possibly easier to do well on, but still an "established dataset". Trade-off...

pminervini commented 7 years ago

Also, at the moment we are not doing any experiments about Zero-Shot Learning but (especially after Tim's talk yesterday) I think it's extremely important and relevant in this context

As a side-note, we can also consider YAGO, DBpedia and, if we want to go organic, Bio2RDF/DrugBank

pminervini commented 7 years ago

Rules with very high support in YAGO - @tdmeeste @rockt do you think we can consider this dataset as well?

$ ./tools/amie-to-clauses.py -B 10000 data/yago3_mte10_5k/rules/yago3_mte10-rules.txt 
livesIn(X0, X1) :- wasBornIn(X0, X1)
livesIn(X0, X2) :- graduatedFrom(X0, X1), isLocatedIn(X1, X2)
livesIn(X0, X2) :- diedIn(X0, X1), isLocatedIn(X1, X2)
livesIn(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2)
dealsWith(X0, X2) :- isLocatedIn(X0, X1), dealsWith(X1, X2)
isPoliticianOf(X0, X2) :- diedIn(X0, X1), hasCapital(X2, X1)
isPoliticianOf(X0, X2) :- wasBornIn(X0, X1), hasCapital(X2, X1)
isPoliticianOf(X0, X2) :- graduatedFrom(X0, X1), isLocatedIn(X1, X2)
isPoliticianOf(X0, X2) :- diedIn(X0, X1), isLocatedIn(X1, X2)
isPoliticianOf(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2)
isPoliticianOf(X0, X2) :- wasBornIn(X0, X1), isLocatedIn(X1, X2)
isLocatedIn(X0, X2) :- isLocatedIn(X0, X1), hasCapital(X2, X1)
isLocatedIn(X0, X2) :- isLocatedIn(X0, X1), isLocatedIn(X1, X2)
isLocatedIn(X0, X2) :- isLocatedIn(X1, X0), isLocatedIn(X1, X2)
hasAcademicAdvisor(X0, X1) :- influences(X1, X0)
isAffiliatedTo(X0, X1) :- playsFor(X0, X1)
hasCapital(X0, X1) :- isLocatedIn(X0, X1)
hasCurrency(X0, X2) :- isLocatedIn(X0, X1), hasCurrency(X1, X2)
diedIn(X0, X1) :- wasBornIn(X0, X1)
diedIn(X0, X2) :- playsFor(X0, X1), isLocatedIn(X1, X2)
diedIn(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2)
hasOfficialLanguage(X0, X2) :- isLocatedIn(X0, X1), hasOfficialLanguage(X1, X2)
isConnectedTo(X0, X1) :- isConnectedTo(X1, X0)
isCitizenOf(X0, X2) :- wasBornIn(X0, X1), hasCapital(X2, X1)
isCitizenOf(X0, X2) :- diedIn(X0, X1), hasCapital(X2, X1)
isCitizenOf(X0, X2) :- graduatedFrom(X0, X1), isLocatedIn(X1, X2)
isCitizenOf(X0, X2) :- wasBornIn(X0, X1), isLocatedIn(X1, X2)
isCitizenOf(X0, X2) :- diedIn(X0, X1), isLocatedIn(X1, X2)
isCitizenOf(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2)
isCitizenOf(X0, X2) :- actedIn(X0, X1), isLocatedIn(X1, X2)
playsFor(X0, X1) :- isAffiliatedTo(X0, X1)
tdmeeste commented 7 years ago

Definitely! Nice rules! But maybe we should also have a minimum confidence, besides the minimum support (and maybe 10000 is a bit high, excluding many rules which might also have a non-negligible contribution to dev/test facts). For example these isPolitician rules:

isPoliticianOf(X0, X2) :- wasBornIn(X0, X1), hasCapital(X2, X1)

would certainly have a high support (many persons/countries/capitals) would satisfy the body, but probably only in few cases the head is satisfied (very low confidence; only for persons that are also politicians), and it seems to me that the rule doesn't make any sense. What was the confidence for that rule? What if you only retain rules with confidence >= 0.9 or so?

pminervini commented 7 years ago

Here's the rules with support at least 10000, with all the corresponding scores:

$  ./tools/amie-to-clauses.py -B 10000 data/yago3_mte10_5k/rules/yago3_mte10-rules.txt -s
isConnectedTo(X0, X1) :- isConnectedTo(X1, X0)
Head Coverage: 0.662486352  Std Confidence: 0.662486352 Body Size: 32055.0  PCA Body Size: 31605.0

isAffiliatedTo(X0, X1) :- playsFor(X0, X1)
Head Coverage: 0.746015736  Std Confidence: 0.868620415 Body Size: 321024.0 PCA Body Size: 294723.0

hasAcademicAdvisor(X0, X1) :- influences(X1, X0)
Head Coverage: 0.067833698  Std Confidence: 0.005788982 Body Size: 10710.0  PCA Body Size: 336.0

isLocatedIn(X0, X2) :- isLocatedIn(X0, X1), hasCapital(X2, X1)
Head Coverage: 0.027032209  Std Confidence: 0.145211123 Body Size: 16507.0  PCA Body Size: 16507.0

isLocatedIn(X0, X2) :- isLocatedIn(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.387066943  Std Confidence: 0.230260907 Body Size: 149057.0 PCA Body Size: 149057.0

isLocatedIn(X0, X2) :- isLocatedIn(X1, X0), isLocatedIn(X1, X2)
Head Coverage: 0.174271472  Std Confidence: 0.146252129 Body Size: 105660.0 PCA Body Size: 97212.0

playsFor(X0, X1) :- isAffiliatedTo(X0, X1)
Head Coverage: 0.868620415  Std Confidence: 0.746015736 Body Size: 373783.0 PCA Body Size: 337862.0

hasCurrency(X0, X2) :- isLocatedIn(X0, X1), hasCurrency(X1, X2)
Head Coverage: 0.082568807  Std Confidence: 0.000559841 Body Size: 16076.0  PCA Body Size: 16.0

hasOfficialLanguage(X0, X2) :- isLocatedIn(X0, X1), hasOfficialLanguage(X1, X2)
Head Coverage: 0.181208054  Std Confidence: 0.003883495 Body Size: 13905.0  PCA Body Size: 78.0

hasCapital(X0, X1) :- isLocatedIn(X0, X1)
Head Coverage: 0.38659392   Std Confidence: 0.011187297 Body Size: 88672.0  PCA Body Size: 4236.0

dealsWith(X0, X2) :- isLocatedIn(X0, X1), dealsWith(X1, X2)
Head Coverage: 0.023041475  Std Confidence: 0.000197043 Body Size: 152251.0 PCA Body Size: 179.0

isPoliticianOf(X0, X2) :- diedIn(X0, X1), hasCapital(X2, X1)
Head Coverage: 0.184466019  Std Confidence: 0.015589591 Body Size: 25594.0  PCA Body Size: 3001.0

isPoliticianOf(X0, X2) :- wasBornIn(X0, X1), hasCapital(X2, X1)
Head Coverage: 0.165048544  Std Confidence: 0.004652799 Body Size: 76728.0  PCA Body Size: 2591.0

isPoliticianOf(X0, X2) :- graduatedFrom(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.335644938  Std Confidence: 0.04109589  Body Size: 17666.0  PCA Body Size: 3553.0

isPoliticianOf(X0, X2) :- diedIn(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.262598243  Std Confidence: 0.016999372 Body Size: 33413.0  PCA Body Size: 2459.0

isPoliticianOf(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.246417013  Std Confidence: 0.00410752  Body Size: 129762.0 PCA Body Size: 2462.0

isPoliticianOf(X0, X2) :- wasBornIn(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.445214979  Std Confidence: 0.006417903 Body Size: 150049.0 PCA Body Size: 3710.0

livesIn(X0, X1) :- wasBornIn(X0, X1)
Head Coverage: 0.045637584  Std Confidence: 0.0030237   Body Size: 44978.0  PCA Body Size: 1290.0

livesIn(X0, X2) :- graduatedFrom(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.223825503  Std Confidence: 0.037756142 Body Size: 17666.0  PCA Body Size: 4643.0

livesIn(X0, X2) :- diedIn(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.079865772  Std Confidence: 0.007122976 Body Size: 33413.0  PCA Body Size: 1459.0

livesIn(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.055704698  Std Confidence: 0.001279265 Body Size: 129762.0 PCA Body Size: 959.0

isCitizenOf(X0, X2) :- wasBornIn(X0, X1), hasCapital(X2, X1)
Head Coverage: 0.126193922  Std Confidence: 0.005682411 Body Size: 76728.0  PCA Body Size: 4256.0

isCitizenOf(X0, X2) :- diedIn(X0, X1), hasCapital(X2, X1)
Head Coverage: 0.120694645  Std Confidence: 0.016292881 Body Size: 25594.0  PCA Body Size: 3816.0

isCitizenOf(X0, X2) :- graduatedFrom(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.376845152  Std Confidence: 0.073700894 Body Size: 17666.0  PCA Body Size: 6671.0

isCitizenOf(X0, X2) :- wasBornIn(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.483646889  Std Confidence: 0.011136362 Body Size: 150049.0 PCA Body Size: 7528.0

isCitizenOf(X0, X2) :- diedIn(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.273227207  Std Confidence: 0.028252477 Body Size: 33413.0  PCA Body Size: 4261.0

isCitizenOf(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.060781476  Std Confidence: 0.001618347 Body Size: 129762.0 PCA Body Size: 1462.0

isCitizenOf(X0, X2) :- actedIn(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.032416787  Std Confidence: 0.009928198 Body Size: 11281.0  PCA Body Size: 303.0

diedIn(X0, X1) :- wasBornIn(X0, X1)
Head Coverage: 0.122404844  Std Confidence: 0.02516786  Body Size: 44978.0  PCA Body Size: 6501.0

diedIn(X0, X2) :- playsFor(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.026492215  Std Confidence: 0.003196097 Body Size: 76656.0  PCA Body Size: 1678.0

diedIn(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2)
Head Coverage: 0.070069204  Std Confidence: 0.004993758 Body Size: 129762.0 PCA Body Size: 4187.0
riedelcastro commented 7 years ago

Generally, I think that unless we support weight learning (and model expectations more like in traditional GAN), we can't really do soft rules that hold sometimes, but not all the time.

S

On Wed, Feb 15, 2017 at 1:56 PM, Pasquale Minervini < notifications@github.com> wrote:

Here's the rules with support at least 10000, with all the corresponding scores:

$ ./tools/amie-to-clauses.py -B 10000 data/yago3_mte10_5k/rules/yago3_mte10-rules.txt -s isConnectedTo(X0, X1) :- isConnectedTo(X1, X0) Head Coverage: 0.662486352 Std Confidence: 0.662486352 Body Size: 32055.0 PCA Body Size: 31605.0

isAffiliatedTo(X0, X1) :- playsFor(X0, X1) Head Coverage: 0.746015736 Std Confidence: 0.868620415 Body Size: 321024.0 PCA Body Size: 294723.0

hasAcademicAdvisor(X0, X1) :- influences(X1, X0) Head Coverage: 0.067833698 Std Confidence: 0.005788982 Body Size: 10710.0 PCA Body Size: 336.0

isLocatedIn(X0, X2) :- isLocatedIn(X0, X1), hasCapital(X2, X1) Head Coverage: 0.027032209 Std Confidence: 0.145211123 Body Size: 16507.0 PCA Body Size: 16507.0

isLocatedIn(X0, X2) :- isLocatedIn(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.387066943 Std Confidence: 0.230260907 Body Size: 149057.0 PCA Body Size: 149057.0

isLocatedIn(X0, X2) :- isLocatedIn(X1, X0), isLocatedIn(X1, X2) Head Coverage: 0.174271472 Std Confidence: 0.146252129 Body Size: 105660.0 PCA Body Size: 97212.0

playsFor(X0, X1) :- isAffiliatedTo(X0, X1) Head Coverage: 0.868620415 Std Confidence: 0.746015736 Body Size: 373783.0 PCA Body Size: 337862.0

hasCurrency(X0, X2) :- isLocatedIn(X0, X1), hasCurrency(X1, X2) Head Coverage: 0.082568807 Std Confidence: 0.000559841 Body Size: 16076.0 PCA Body Size: 16.0

hasOfficialLanguage(X0, X2) :- isLocatedIn(X0, X1), hasOfficialLanguage(X1, X2) Head Coverage: 0.181208054 Std Confidence: 0.003883495 Body Size: 13905.0 PCA Body Size: 78.0

hasCapital(X0, X1) :- isLocatedIn(X0, X1) Head Coverage: 0.38659392 Std Confidence: 0.011187297 Body Size: 88672.0 PCA Body Size: 4236.0

dealsWith(X0, X2) :- isLocatedIn(X0, X1), dealsWith(X1, X2) Head Coverage: 0.023041475 Std Confidence: 0.000197043 Body Size: 152251.0 PCA Body Size: 179.0

isPoliticianOf(X0, X2) :- diedIn(X0, X1), hasCapital(X2, X1) Head Coverage: 0.184466019 Std Confidence: 0.015589591 Body Size: 25594.0 PCA Body Size: 3001.0

isPoliticianOf(X0, X2) :- wasBornIn(X0, X1), hasCapital(X2, X1) Head Coverage: 0.165048544 Std Confidence: 0.004652799 Body Size: 76728.0 PCA Body Size: 2591.0

isPoliticianOf(X0, X2) :- graduatedFrom(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.335644938 Std Confidence: 0.04109589 Body Size: 17666.0 PCA Body Size: 3553.0

isPoliticianOf(X0, X2) :- diedIn(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.262598243 Std Confidence: 0.016999372 Body Size: 33413.0 PCA Body Size: 2459.0

isPoliticianOf(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.246417013 Std Confidence: 0.00410752 Body Size: 129762.0 PCA Body Size: 2462.0

isPoliticianOf(X0, X2) :- wasBornIn(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.445214979 Std Confidence: 0.006417903 Body Size: 150049.0 PCA Body Size: 3710.0

livesIn(X0, X1) :- wasBornIn(X0, X1) Head Coverage: 0.045637584 Std Confidence: 0.0030237 Body Size: 44978.0 PCA Body Size: 1290.0

livesIn(X0, X2) :- graduatedFrom(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.223825503 Std Confidence: 0.037756142 Body Size: 17666.0 PCA Body Size: 4643.0

livesIn(X0, X2) :- diedIn(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.079865772 Std Confidence: 0.007122976 Body Size: 33413.0 PCA Body Size: 1459.0

livesIn(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.055704698 Std Confidence: 0.001279265 Body Size: 129762.0 PCA Body Size: 959.0

isCitizenOf(X0, X2) :- wasBornIn(X0, X1), hasCapital(X2, X1) Head Coverage: 0.126193922 Std Confidence: 0.005682411 Body Size: 76728.0 PCA Body Size: 4256.0

isCitizenOf(X0, X2) :- diedIn(X0, X1), hasCapital(X2, X1) Head Coverage: 0.120694645 Std Confidence: 0.016292881 Body Size: 25594.0 PCA Body Size: 3816.0

isCitizenOf(X0, X2) :- graduatedFrom(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.376845152 Std Confidence: 0.073700894 Body Size: 17666.0 PCA Body Size: 6671.0

isCitizenOf(X0, X2) :- wasBornIn(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.483646889 Std Confidence: 0.011136362 Body Size: 150049.0 PCA Body Size: 7528.0

isCitizenOf(X0, X2) :- diedIn(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.273227207 Std Confidence: 0.028252477 Body Size: 33413.0 PCA Body Size: 4261.0

isCitizenOf(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.060781476 Std Confidence: 0.001618347 Body Size: 129762.0 PCA Body Size: 1462.0

isCitizenOf(X0, X2) :- actedIn(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.032416787 Std Confidence: 0.009928198 Body Size: 11281.0 PCA Body Size: 303.0

diedIn(X0, X1) :- wasBornIn(X0, X1) Head Coverage: 0.122404844 Std Confidence: 0.02516786 Body Size: 44978.0 PCA Body Size: 6501.0

diedIn(X0, X2) :- playsFor(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.026492215 Std Confidence: 0.003196097 Body Size: 76656.0 PCA Body Size: 1678.0

diedIn(X0, X2) :- isAffiliatedTo(X0, X1), isLocatedIn(X1, X2) Head Coverage: 0.070069204 Std Confidence: 0.004993758 Body Size: 129762.0 PCA Body Size: 4187.0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/uclmr/inferbeddings/issues/9#issuecomment-280016912, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKMzH9xA9OHoxIkSCkFElSAQcwDTG0jks5rcwQqgaJpZM4L4Krs .

pminervini commented 7 years ago

Generally, I think that unless we support weight learning (and model expectations more like in traditional GAN), we can't really do soft rules that hold sometimes, but not all the time.

So we stick to rules with high "Std Confidence", is that right?

For WordNet thw following works (which is basically what we are using):

$ ./tools/amie-to-clauses.py data/wn18/rules/wn18-rules.txt -B 1000 -C 0.9
_instance_hypernym(X0, X1) :- _instance_hyponym(X1, X0)
_member_meronym(X0, X1) :- _member_holonym(X1, X0)
_hyponym(X0, X1) :- _hypernym(X1, X0)
_synset_domain_topic_of(X0, X1) :- _member_of_domain_topic(X1, X0)
_hypernym(X0, X1) :- _hyponym(X1, X0)
_member_holonym(X0, X1) :- _member_meronym(X1, X0)
_derivationally_related_form(X0, X1) :- _derivationally_related_form(X1, X0)
_instance_hyponym(X0, X1) :- _instance_hypernym(X1, X0)
_has_part(X0, X1) :- _part_of(X1, X0)
_member_of_domain_topic(X0, X1) :- _synset_domain_topic_of(X1, X0)
_verb_group(X0, X1) :- _verb_group(X1, X0)
_part_of(X0, X1) :- _has_part(X1, X0)
tdmeeste commented 7 years ago

indeed; but the high-support yago rules that don't make sense have a very low confidence (< 0.1). To be safe, let's try with -B 1000 -C 0.8 or so.

pminervini commented 7 years ago

By increasing the confidence, the ruleset gets tiny (but I think it's still ok because the number of predicates is low: |R| = 37)

$ ./tools/amie-to-clauses.py data/yago3_mte10_5k/rules/yago3_mte10-rules.txt -B 100 -C 0.8 -s
isAffiliatedTo(X0, X1) :- playsFor(X0, X1)
Head Coverage: 0.746015736  Std Confidence: 0.868620415 Body Size: 321024.0 PCA Body Size: 294723.0

hasNeighbor(X0, X1) :- hasNeighbor(X1, X0)
Head Coverage: 0.990990991  Std Confidence: 0.990990991 Body Size: 555.0    PCA Body Size: 554.0

hasNeighbor(X0, X2) :- dealsWith(X1, X0), hasNeighbor(X1, X2)
Head Coverage: 0.293693694  Std Confidence: 0.993902439 Body Size: 164.0    PCA Body Size: 164.0

hasGender(X0, X2) :- hasAcademicAdvisor(X0, X1), hasGender(X1, X2)
Head Coverage: 0.011320527  Std Confidence: 0.946902655 Body Size: 791.0    PCA Body Size: 779.0

hasGender(X0, X2) :- influences(X1, X0), hasGender(X1, X2)
Head Coverage: 0.036727476  Std Confidence: 0.838509317 Body Size: 2898.0   PCA Body Size: 2850.0

isMarriedTo(X0, X1) :- isMarriedTo(X1, X0)
Head Coverage: 0.969922811  Std Confidence: 0.969922811 Body Size: 3757.0   PCA Body Size: 3700.0

isLocatedIn(X0, X2) :- hasCapital(X1, X0), isLocatedIn(X1, X2)
Head Coverage: 0.010138488  Std Confidence: 0.90625 Body Size: 992.0    PCA Body Size: 990.0
riedelcastro commented 7 years ago

Yes, this makes sense. Most deterministic rules look like this. We can try noisy rules, but this may lead to problems due to us not using expectations and samples.

pminervini commented 7 years ago

Similar rules for DBpedia (Music fragment):

$ ./tools/amie-to-clauses.py data/music_mte10_5k/rules/music_2015-10_mte10-rules.txt -B 100 -C 0.8
<http://dbpedia.org/ontology/musicalArtist>(X0, X1) :- <http://dbpedia.org/ontology/musicalBand>(X0, X1)
<http://dbpedia.org/ontology/musicalBand>(X0, X1) :- <http://dbpedia.org/ontology/musicalArtist>(X0, X1)
<http://dbpedia.org/ontology/associatedBand>(X0, X1) :- <http://dbpedia.org/ontology/associatedMusicalArtist>(X0, X1)
<http://dbpedia.org/ontology/associatedBand>(X0, X2) :- <http://dbpedia.org/ontology/associatedBand>(X1, X0), <http://dbpedia.org/ontology/associatedMusicalArtist>(X2, X1)
<http://dbpedia.org/ontology/associatedBand>(X0, X2) :- <http://dbpedia.org/ontology/associatedMusicalArtist>(X1, X0), <http://dbpedia.org/ontology/associatedMusicalArtist>(X2, X1)
<http://dbpedia.org/ontology/associatedMusicalArtist>(X0, X1) :- <http://dbpedia.org/ontology/associatedBand>(X0, X1)

This dataset (extracted from an older version of DBpedia) was also used here (an application of RESCAL for querying probabilistic KBs): http://iswc2015.semanticweb.org/sites/iswc2015.semanticweb.org/files/93660577.pdf

pminervini commented 7 years ago

Datasets from Guo et al.'s EMNLP16 paper (thanks @rockt) are available here: https://github.com/uclmr/inferbeddings/tree/master/data/guo-emnlp16