typedb / typeql

TypeQL: the modern query language of TypeDB
https://typedb.com
Mozilla Public License 2.0
217 stars 45 forks source link

Introduce syntax for interface covariance #325

Open james-whiteside opened 6 months ago

james-whiteside commented 6 months ago

Current variance behaviour

Currently, definitions of interface implementations (i.e. owns and plays statements) are covariant in the implementing type (the owner or roleplayer), and invariant in the implemented interface (the ownership or role). Consider the following example.

define

animal sub entity, abstract;
animal-group sub relation,
    relates group-member;

bird sub animal;
elephant sub animal;
whale sub animal;

goose sub bird;
magpie sub bird;
heron sub bird;

flock sub animal-group,
    relates flock-member as group-member;
herd sub animal-group,
    relates herd-member as group-member;
pod sub animal-group,
    relates pod-member as group-member;

animal plays animal-group:group-member;
bird plays flock:flock-member;
elephant plays herd:herd-member;
whale plays pod:pod-member;

In this case, bird is able to play flock:flock-member because the implementation by bird is covariant, but animal is not able to play the role because the interface itself is invariant. This is correct behaviour. If we translate the schema into natural language, we'd get the following facts.

Based on these facts, the following deduction is correct by modus ponens.

However, the following deduction is incorrect by affirming the consequent.

The current behaviour of TypeQL is consistent with these rules of predicate logic. In fact, to be completely correct, interfaces would need to be contravariant rather than invariant, but that's not the point of this issue.

TypeQL's key value prop

One of TypeQL's most valuable features is its declarativity. The fact that goose can play flock:flock-member because it is a subtype of bird without us having to separately declare it is a big value prop. If we introduce new subtypes of bird, then they will get this capability automatically. This is consistent with the fact "all birds can join flocks".

Compare the approach of a relational database, where we'd have to explicitly declare the behaviour for every subtype. Then, if we introduce a new subtype of bird, it doesn't automatically get the capability. This is inconsistent with the fact "all birds can join flocks". The fact actually expressed by the relational database is "all geese, magpies, and herons can join flocks", so naturally the new bird subtype wouldn't gain the capability.

The key value prop of TypeQL, when compared with SQL, is that we are able to accurately express the fact "all birds can join flocks".

The gap in expressivity

Let's say we want to allow any animal to join any animal group, strange as it would be. The current approach to doing this in TypeQL would involve declaring animal to play every group member subtype, for instance as follows.

animal plays flock:flock-member;
animal plays herd:herd-member;
animal plays pod:pod-member;

If we were to introduce a new animal group, for instance a pride, then all animals would not automatically gain the capability to join prides. This is inconsistent with the intended behaviour, which would be represented by the fact "all animals can join all animal groups". The fact actually expressed is "all animals can join flocks, herds, and pods".

In this instace, the expressivity of TypeQL does not exceed that of SQL.

Filling the gap

While we certainly want invariant (or contravariant) interfaces, we also want to be able to have covariant interfaces. This would allow both of the following facts to be accurately expressed in TypeQL.

Currently, only the former can be accurately expressed. This has many more realistic applications, for instance allowing any person to be any kind of employee, even kinds that have not been defined yet, which is expressed as "all people can be all types of employee". The inability to express this is contrary to our key value prop.

Proposed syntax

Introduce new keyword variants owns* and plays*. These would cause interface implementations to be covariant in implementing type, and covariant (or bivariant) in the implemented interface.

animal plays* animal-group:group-member;
cxdorn commented 6 months ago

We discussed the plays* and owns* keyword before. There is nothing on the modeling theory side that fundamentally obstructs introducing them. Under the hood, they could act like macros that are simply expanded into plays and owns declarations for all sub-interfaces (kept up-to-date when the schema changes of course). That being said, we should probably spend more time gauging interest and relevance before introducing the new * operator.

brettforbes commented 5 months ago

There is a need for the replacement of a relation on an attribute. This replacement could used to accomplish high-value, broadly applicable infrastructure capabilities that require relations linking to attributes where a relation owning all attributes becomes impractical due to the sheer number of attributes,

Example use cases include:

  1. version control,
  2. temporal access records (CRUD), and
  3. cybersecurity markings

    The aim should be to build this capability as modular extensions to the current IAM.

Instead of needing a relation owning >400 attributes, maybe the above approach would work, if we

  1. Base all attributes on an abstract class *we already do)
  2. Setup the relation owning the abstract classes with the syntax above.

Question