Introduce syntax for interface covariance

james-whiteside commented 8 months ago

Current variance behaviour

Currently, definitions of interface implementations (i.e. owns and plays statements) are covariant in the implementing type (the owner or roleplayer), and invariant in the implemented interface (the ownership or role). Consider the following example.

define

animal sub entity, abstract;
animal-group sub relation,
    relates group-member;

bird sub animal;
elephant sub animal;
whale sub animal;

goose sub bird;
magpie sub bird;
heron sub bird;

flock sub animal-group,
    relates flock-member as group-member;
herd sub animal-group,
    relates herd-member as group-member;
pod sub animal-group,
    relates pod-member as group-member;

animal plays animal-group:group-member;
bird plays flock:flock-member;
elephant plays herd:herd-member;
whale plays pod:pod-member;

In this case, bird is able to play flock:flock-member because the implementation by bird is covariant, but animal is not able to play the role because the interface itself is invariant. This is correct behaviour. If we translate the schema into natural language, we'd get the following facts.

All birds are animals.
All elephants are animals.
All whales are animals.
All geese are birds.
All magpies are birds.
All herons are birds.
All flocks are animal groups.
All herds are animal groups.
All pods are animal groups.
All animals can join animal groups.
All birds can join flocks.
All elephants can join herds.
All whales can join pods.

Based on these facts, the following deduction is correct by modus ponens.

All geese can join flocks.

However, the following deduction is incorrect by affirming the consequent.

All animals can join flocks.

The current behaviour of TypeQL is consistent with these rules of predicate logic. In fact, to be completely correct, interfaces would need to be contravariant rather than invariant, but that's not the point of this issue.

TypeQL's key value prop

One of TypeQL's most valuable features is its declarativity. The fact that goose can play flock:flock-member because it is a subtype of bird without us having to separately declare it is a big value prop. If we introduce new subtypes of bird, then they will get this capability automatically. This is consistent with the fact "all birds can join flocks".

Compare the approach of a relational database, where we'd have to explicitly declare the behaviour for every subtype. Then, if we introduce a new subtype of bird, it doesn't automatically get the capability. This is inconsistent with the fact "all birds can join flocks". The fact actually expressed by the relational database is "all geese, magpies, and herons can join flocks", so naturally the new bird subtype wouldn't gain the capability.

The key value prop of TypeQL, when compared with SQL, is that we are able to accurately express the fact "all birds can join flocks".

The gap in expressivity

Let's say we want to allow any animal to join any animal group, strange as it would be. The current approach to doing this in TypeQL would involve declaring animal to play every group member subtype, for instance as follows.

animal plays flock:flock-member;
animal plays herd:herd-member;
animal plays pod:pod-member;

If we were to introduce a new animal group, for instance a pride, then all animals would not automatically gain the capability to join prides. This is inconsistent with the intended behaviour, which would be represented by the fact "all animals can join all animal groups". The fact actually expressed is "all animals can join flocks, herds, and pods".

In this instace, the expressivity of TypeQL does not exceed that of SQL.

Filling the gap

While we certainly want invariant (or contravariant) interfaces, we also want to be able to have covariant interfaces. This would allow both of the following facts to be accurately expressed in TypeQL.

All animals can join animal groups.
All animals can join all animal groups.

Currently, only the former can be accurately expressed. This has many more realistic applications, for instance allowing any person to be any kind of employee, even kinds that have not been defined yet, which is expressed as "all people can be all types of employee". The inability to express this is contrary to our key value prop.

Proposed syntax

Introduce new keyword variants owns* and plays*. These would cause interface implementations to be covariant in implementing type, and covariant (or bivariant) in the implemented interface.

animal plays* animal-group:group-member;

cxdorn commented 8 months ago

We discussed the plays* and owns* keyword before. There is nothing on the modeling theory side that fundamentally obstructs introducing them. Under the hood, they could act like macros that are simply expanded into plays and owns declarations for all sub-interfaces (kept up-to-date when the schema changes of course). That being said, we should probably spend more time gauging interest and relevance before introducing the new * operator.

brettforbes commented 8 months ago

There is a need for the replacement of a relation on an attribute. This replacement could used to accomplish high-value, broadly applicable infrastructure capabilities that require relations linking to attributes where a relation owning all attributes becomes impractical due to the sheer number of attributes,

Example use cases include:

version control,
temporal access records (CRUD), and
cybersecurity markings

The aim should be to build this capability as modular extensions to the current IAM.

Instead of needing a relation owning >400 attributes, maybe the above approach would work, if we

Base all attributes on an abstract class *we already do)
Setup the relation owning the abstract classes with the syntax above.

Question

would the return include the child attribute name as well as the value

typedb / typeql