propi / rdfrules

RDFRules: Analytical Tool for Rule Mining from RDF Knowledge Graphs
GNU General Public License v3.0
28 stars 2 forks source link

Mining rules for KG completion - confidence threshold within mining operator #71

Open kliegr opened 3 years ago

kliegr commented 3 years ago

For mining rules for KG completion, it would be useful to have some "mode" for finding rules suitable for the prediction of missing relations. Maybe it would suffice if there is a feature allowing imposing an upper limit on confidence. For example, in some tasks, it can happen that the first 10.000 rules all have the confidence of 100%. Rules with 100% confidence do not have any missing predictions, therefore they are not suitable for prediction.

RDFRules allows to filter rules by confidence using the Filter operator. The problem is that in some KG completion task, the vast majority of generated rules may have the support of 100% and I suspect that storage of such rules in memory could lead to an out of memory error.

For example, in the task attached, the mining generates in five minutes a ruleset cache file of 250 MBs size containing 3.474.110 rules. Without the time limit, the mining would eventually run out of memory. After applying the filtering operator on confidence ("<1.0"), only 449 rules remain. For KG Completion, only the rules with confidence below 1.0 are useful. Computation of confidence within the mining operator could substantially reduce the memory footprint and thus allow for larger datasets to be analyzed.

task (71).json.zip