propi / rdfrules

RDFRules: Analytical Tool for Rule Mining from RDF Knowledge Graphs
GNU General Public License v3.0
28 stars 2 forks source link

Merging Rule Sets with Different Indexes #61

Open nvkp opened 3 years ago

nvkp commented 3 years ago

The current library is not able to handle a concatenation of two rule sets with a different index. I provide a script and sample data on which the problem can be replicated:

sbt clean compile run
rules count from the first task: 6
rules count from the second task: 45
[error] (run-main-0) java.util.NoSuchElementException: None.get
[error] java.util.NoSuchElementException: None.get
[error]         at scala.None$.get(Option.scala:529)
[error]         at scala.None$.get(Option.scala:527)
[error]         at com.github.propi.rdfrules.index.TripleItemIndex.getTripleItem(TripleItemIndex.scala:16)
...
... see stacktrace.txt

I thought if the indexes are not equal, they could be merged together through their transormation into data sets, their rules could be fitted into the new index and a new rule set with the new index and rules from both rule sets could be created. I cloned the current master branch and changed the +(ruleset: Ruleset) method in the Ruleset.scala class:

  def +(ruleset: Ruleset): Ruleset = {
      if (index.hashCode().equals(ruleset.index.hashCode())) {
        transform(rules.concat(ruleset.rules))
      } else {
        val mergedDataset = index.toDataset + ruleset.index.toDataset
        val mergedIndex = mergedDataset.index()
        val mergedRules = mergedIndex.tripleItemMap { implicit mapper => 
          rules.map(r => ResolvedRule.simple(r)) ++ ruleset.rules.map(r => ResolvedRule.simple(r))
        }.filter(_._2.isEmpty).map(_._1)
        new Ruleset(mergedRules, mergedIndex, parallelism, isCached)
      }
  }

and published the project locally:

sbt clean compile publishLocal

changed the dependency declaration:

libraryDependencies += "com.github.propi.rdfrules" %% "core" % "1.0.0"

and ran the script again:

sbt clean compile run
rules count from the first task: 6
rules count from the second task: 45
number of exported rules: 51
[success] Total time: 3 s, completed 20. 5. 2021 13:42:01

My Scala version:

$ scala -version
Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc

attachments.zip

propi commented 3 years ago

Thank you for your recommendation. I will try to integrate it within a follow version.