knowsys / rulewerk

Java library based on the VLog rule engine
Apache License 2.0
32 stars 13 forks source link

Unexpected transformation of NamedNulls into AbstractConstants during reasoning #202

Open francesco-kriegel opened 3 years ago

francesco-kriegel commented 3 years ago

Hi,

I am using VLog through your Java adapter with the goal of enumerating homomorphisms, which essentially works. However, I recognized that, during reasoning, instances of NamedNull occurring in a knowledge base will be transformed into instances of AbstractConstant. The situation can be reproduced by means of the following steps:

  1. Create or parse an instance kb of KnowledgeBase in which instances of NamedNull occur.
  2. Initialize an instance reasoner of VLogReasoner with argument kb.
  3. Call reasoner.reason().
  4. Walk through reasoner.getInferences(), which does not contain the facts in kb anymore but rather copies of them where each instance of NamedNull is replaced by an according instance of AbstractConstant.

I already had a short conversation with @mmarx and he explained that this happens due to skolemization, and he advised me to open an issue report here.

In my humble opinion, I would rather expect the chase to be a superset of the given knowledge base. Furthermore---please correct me if I am wrong---I interpret a constant as an object entity with a fixed, known name and a null as an object entity without a name (but which we give an internal name within implementations to make them accessible)---in that sense I find it problematic to transform nulls into constants, since a name for an object entity without a known name is just invented, which seems to be logically wrong, and further since it gets difficult to recognize which fresh abstract constant was introduced for a named null in the input. Of course, there is an easy workaround by just using abstract constants and storing those that one considers as nulls in a set, but then the distinction between constants and nulls by the datatype is lost.

Thanks in advance for your reaction.

Best regards, Francesco

mkroetzsch commented 3 years ago

VLog does not support named nulls in syntactic inputs. Rulewerk allows them on the Java level, since inferred facts may contain nulls. If nulls are also used in input data, Rulewerk will skolemize them for you before sending them to VLog (rather than just throwing an error).

In general, it is not recommended to use nulls in input data. They have no advantage over constants, but several disadvantages:

I am not sure what the resolution to this bug should be. Maybe some documentation to help users understand the expected behaviour? We could also throw an exception when users try to give nulls in inputs to VLog, but I am not really in favour of this.

francesco-kriegel commented 3 years ago

I am also unsure what the solution to this bug could be. Specifically, I think that it depends on the point of view.

For our prototypical implementation of an ABox repair approach [2], we employ the latter point of view. As datasets we use so-called "quantified ABoxes" [1], which are essentially ABoxes with variables (anonymous individuals). As a starting point for constructing the actual repair, we use the so-called CQ-saturation, which is essentially the chase of the ABox w.r.t. the TBox. Exactly for this purpose we are using your code. However, for correctness of the repair it is essential that there is a clear distinction between the constants (named individuals) and the nulls (variables, anonymous individuals). This distinction is lost due to skolemization.

In essence, skolemization of such a quantified ABox would produce a conservative extension w.r.t. individual names, which might logically be fine if this is expected behaviour, but skolemization does not produce a logical consequence. For instance the dataset { A(x) } where x is a null does not entail its skolemization { A(a) } where a is a fresh constant. Since we require that a repair is a logical consequence of the input dataset, we can thus not rely on the skolemization and its chase without further ado.

As I already wrote in my first comment, I would expect that the chase is a superset of the input dataset.

Documenting this behaviour will definitely help, maybe also writing a warning message to the logger if the input contains nulls or even crashing in such cases.

As a workaround in the cases where the user expects that the chase is a logical consequence of the given dataset plus the given ruleset, either the developer using your code could take care of the transformation of nulls to constants and do the back-transformation afterwards herself/himself, or Rulewerk could maintain a map from the nulls to the according Skolem constants and do the back-transformation before the chase is returned.

-- References:

[1] Franz Baader, Francesco Kriegel, Adrian Nuradiansyah, and Rafael Peñaloza: Computing Compliant Anonymisations of Quantified ABoxes w.r.t. EL Policies. ISWC 2020. https://lat.inf.tu-dresden.de/research/papers/2020/BaKrNuPe-ISWC2020.pdf

[2] Franz Baader, Patrick Koopmann, Francesco Kriegel, and Adrian Nuradiansyah: Computing Optimal Repairs of Quantified ABoxes w.r.t. Static EL TBoxes (Extended Version). LTCS-Report 21-01. https://lat.inf.tu-dresden.de/research/reports/2021/BaKoKrNu-LTCS-21-01.pdf