propi / rdfrules

RDFRules: Analytical Tool for Rule Mining from RDF Knowledge Graphs
GNU General Public License v3.0
28 stars 2 forks source link

Indexing seems to remove some triples #78

Closed kliegr closed 2 years ago

kliegr commented 3 years ago

On the same dataset `kg-covid-19_nometa_shortened_20200925.nt', I tried several different ways of counting occurrences of triples with predicate

 grep "> <interacts_with> <" kg-covid-19_nometa_shortened_20200925.nt | wc -l

returns 11856194 lines.

When the same file is loaded into RDFRules using pipeline Load dataset -> Filter quads "predicate: <interacts_with>" -> Size the result is also 11856194.

When the same file is loaded into RDFRules using pipeline Load dataset -> Index -> To dataset -> Filter quads "predicate: <interacts_with>" -> Size the result is 11702183.

The lower count of 11702183 also corresponds to the body size of rule

( ?b <interacts_with> ?a ) => ( ?a <interacts_with> ?b ) | Confidence: 0.9917529917281246, BodySize: 11702183, HeadCoverage: 0.9917529917281246, Support: 11605675, HeadSize: 11702183

mined on the indexed dataset.

Tasks-DifferingCount.zip

propi commented 2 years ago

Index removes all duplicit triples and reflexive triples: <x> <p> <x> . It is maybe the reason - please check it.