vaticle / typedb

TypeDB: the polymorphic database powered by types
https://typedb.com
Mozilla Public License 2.0
3.73k stars 338 forks source link

Support general data integrity constraints #6877

Open cxdorn opened 10 months ago

cxdorn commented 10 months ago

Problem to Solve

TypeQL is missing user-defined data integrity constraints. Such constraints mirror complex dependencies between data. Well-studied dependencies, e.g., include the following (some of these may be addressed in other ways, see below):

All of these can be addressed individually (and some are on the roadmap already, like cardinality constraints). However, it would also be extremely useful to have a user-defined general-purpose mechanism for integrity constraints.

Current Workaround

Currently, the user can manually check whether their data satisfies their specific constraint. For instance, to check for a functional dependency the user could run:

match
    $x isa some-thing;
    $y isa some-thing;
    $z isa some-thing;
    ($x, $y) isa some-relation;
    ($x, $z) isa some-relation;
    not { $y = $z; };

and see whether the query returns anything (and if it does, the functional dependency has been violated).

Proposed Solution

Introduce a keyword constraint which is directly modelled on the mechanics of the rule keyword. (Note this could easily exploit the existing infrastructure for inference with rules.) For example, a functional dependency could be expressed by the user as:

constraint my-constraint:
when {
    $x isa some-thing;
    $y isa some-thing;
    $z isa some-thing;
    ($x, $y) isa some-relation;
    ($x, $z) isa some-relation;
    not { $x = $y; };
} then { 
    raise Error("Detected violation of my-constraint in some-relation");
}

Additional Information

User-defined constraints can be very expensive, since they have to be checked at the end of each data write transaction---this drawback should be communicated clearly to the users. But as long as the risks are known to the user, custom constraints could be very powerful tool for many modeling problems.

cxdorn commented 10 months ago

As @flyingsilverfin pointed out, this doesn't play well with truly concurrent data write transactions since writes in one transaction may affect constraints of another. The way to address this should be the same as other constraints on the roadmap (like cardinality constraints).

flyingsilverfin commented 10 months ago

@cxdorn by the way, my point about enforcing cardinality constraints as a separate annotation versus a general 'pattern' being easier to do is such:

If we have a specific annotation for cardinality restriction, we can at commit time compare with concurrent transactions to validate that the exact concept being modified, assuming some serial order of transactions, would not together violate cardinality. This is a 'local' check around 1 concept at a time and can be done efficiently.

Doing it at a general pattern with extra conditions is more challenging, since we have to essentially run the pattern match across the aggregated result of two transactions as a whole (remember each transaction has its own indepent point-in-time snapshot of the database it operates off) - there could be requirements on cardinality that depend on some other condition being satisifed that has to be checked as well. In short, the full solution involves a much less 'local' check than a simple count on the connections from a concept, which is much harder to implement & less efficient in general.

There may be some solution that does it elegantly, but at first look it seems significantly more complex.

cxdorn commented 10 months ago

@flyingsilverfin By 'complex' you mean 'computationally expensive' right? Not 'implementation-wise complex'?

flyingsilverfin commented 10 months ago

Also implementation complexity, since each transaction has independent snapshots of data and for basic cardinality we only have to 'share' counts of things that have been changed across the snapshot boundaries. To do general constraints we have to 'share' the the full data-level change to the snapshot.

flyingsilverfin commented 10 months ago

But again, there might be an elegant solution where the simple cardinality constraint emerges from the constraint implementation with the same efficiency & simplicity