knowsys / nemo

A fast in-memory rule engine
https://knowsys.github.io/nemo-doc/
Apache License 2.0
84 stars 7 forks source link

Design and implement type system in logical layer #85

Closed aannleax closed 1 year ago

aannleax commented 1 year ago

As part of #70, we planned to have a separate type system on the physical layer and the logical layer. While the former deals with low-level types like u32 or floats, the latter deals with more abstract concepts like "String", "Integer" and so on. The task is to implement this functionality.

Some more considerations:

mkroetzsch commented 1 year ago

Seems a bit underspecified for a single issue. So let me first clarify the key aspects for the basic design:

Note that, when implementing more of the XML Schema types, one also has to add some internal types that ensure that the "hierarchy" is actually a sup-semilattice (some pairs of types have several upper bounds in XSD but no most specific one). But that only comes up when considering much more specific types than initially planned.

Then, for the above questions in their order:

  1. Yes, kind of. One might want to avoid a (presumably slow) conversion of EDB data into logical objects as an intermediate stage to parsing, but some cases would naturally require such a conversion (e.g., for constants in rules).
  2. Yes, this is the notion of well-typed programs. The first implementation only needs to check if the (pre-defined) types are compatible on this level. Type inference (for predicates not typed explicitly) could come later. Note that allowing rules to be added might affect inferred types (making them wider, possibly conflicting with the supremum-based type computation that was used to make some previously known rules well-typed.
  3. This item mixes up several things. Parsing is a separate issue (a single parsed term always should have a specific type that is not depending on the schema; we will determine later if this specific value can be converted to the type required in that column). The possible differences between CSV and RDF representation of typed values is also an unrelated concern. Code for interpreting RDF-style term strings into values should be shared among all parsers that support such syntax, but one can also have parsers that deliberately use other forms for representing the same values (e.g., a JSON parser would have its own double representation).
  4. Yes, that is part of the basic design. The source declarations already give arity, so one can easily imagine how to generalize this. For predicates that are not from a source, one could have a separate form of declaration -- or fully rely on inferred types from the start (with a small initial type set, inference might be easy).
  5. Maybe, or maybe not. I would keep this for later, since it is a harder problem. We can always consider "no declaration" to mean "rdfs:Resource", but note that EDBs always need to be declared with at least an arity anyway, so not saying anything is not an option now. I would rather consider [2] a syntax-level alias for [rdfs:Resource,rdfs:Resource] instead of doing anything "smart" that is a source of confusing errors when it behaves unexpected.
  6. We can of course support distinct types with the same internal structures and handling. But that's not really a design decision, is it? RDF does indeed have a number of integer types down to bool, which would maybe not all be implemented differently in the physical layer.
monsterkrampe commented 1 year ago

Related discussion: #185

monsterkrampe commented 1 year ago

A basic type system with Any, String, Integer and Float64 has been implemented. Since there is no type casting when reasoning yet, only Any and String are compatible because they happen to have the same physical representation.

Still, since this marks a basic milestone of the type system development, I hereby close this issue. We should open new issues for open points and use the existing discussion for clarifying additional questions.