Core schema seems insufficient

grigoryk commented 6 years ago

If core schema is viewed through a lens of multiple mentats figuring out if they're compatible with each other, it seems insufficient. Currently, if two instances agree on the core schema, their transactions are not necessarily compatible with each other.

Quoting @ncalexan, "[when defining vocabularies] we can say a thing needs “to be” an enum (:db.type/ref) but we can’t say “and these are the valid enum cases :db.cardinality/*“. (And the transactor doesn’t handle that value restriction at all.)"

@rnewman you've added the core schema initially; do you think it's reasonable to expand it going forward? Do you think having a subset of the bootstrap transaction defined as a special thing is valuable?

rnewman commented 6 years ago

Before I dig into this, one minor clarification: :db.type/ref doesn't mean enum. Some ref attributes are used as an enum-like thing, but not all, and enums aren't even necessarily named. Mentat doesn't provide a way to enforce that specific kind of enum, and doing so is actually not as simple as it looks, because you would need to also restrict the space of operations that can be performed on :db/ident. But I digress.

Let me restate, see if I understand what you're asking.

You're saying that:

A certain subset of attributes (e.g., :db/cardinality) are db.type/ref, and some of those attributes require their ref values to be entities with :db/ident ("enums").
Some enum attributes expect the range of their :db/ident attributes to be a restricted set (e.g., {:db.cardinality/one, :db.cardinality/many}).
Those restricted sets might well change over time (e.g., adding {:db.cardinality/list}).
Those restricted sets are not captured explicitly in the core schema.
Ergo, we cannot easily detect an important change.

You are correct, but it kinda doesn't matter.

There will always be cases in which the meaning of a schema changes over time; we (and Datomic) have picked/inherited a number of common axes that we directly support (cardinality, doc, uniqueness, etc.), and there are others that we don't ("metaness", permanence/transience, etc.).

Even within the set of properties we model in the core schema there are domain-level concepts that can change in a way we don't formally describe.

There are three ways to represent that in the vocabulary system:

If the change is backward-compatible (in a scenario as outlined above, typically this will be because the restricted set isn't so restricted after all — imagine a property like :monkey/species, where the introduction of :species/callicebus_miltoni in 2015 is A-OK), we can simply begin using the new 'enum case', and older clients will probably behave correctly.
If the change is not backward-compatible, but only narrows the semantics of the property (:height/very-tall is a subset of :height/tall), then we can add a second attribute and keep writing the first. This might well be a data modeling error — we should have recorded :person/height instead!
If the change is not backward-compatible, and (a) there's no way to phrase new data in a way that old clients can handle it safely, and (b) data old clients write should not be missing the new aspect when read by new clients, then we should bump the version number of the vocabulary.

The version number of the vocabulary exists precisely to model this kind of exclusion, where the vocabulary cannot only be implicitly extended, but needs to be replaced.

I don't think it's all that feasible to model every possible restriction in the schema language itself: after all, there are:

Arbitrary restrictions on values: "account numbers must be 10 digits… oh, actually, now we're prefixing them all with 0000".
Arbitrary restrictions on expected structure: "the range of :person/residence is the domain of entities that have a :country/country_code… oh, but people can live in places that don't have country calling codes".
Additions of attributes that must be present (a restriction that Mentat cannot enforce, but can help with by locking out older clients).
Higher-level restrictions: attributes that must both appear together, cardinality restrictions that depend on other attributes (you can have as many passport numbers as countries of citizenship…), and so on.

I used the vocabulary version number to allow developers to indicate that one of these constraints has changed.

(This kind of sophistication is one reason why even complicated SQL databases support triggers and stored procedures to impose computed constraints!)

To return to your question:

The reason you're asking is that merging two databases which have the same core vocabulary version but allow different enum cases for a schema-related attribute (cardinality being one such) will break Mentat.

The weaker version of that scenario is that merging two databases which have the same non-core vocabulary version but use different enum cases for its non-schema attribute might break application code.

You have two choices here.

The first, which is the one I was taking, is to say: don't do that. If you have an attribute that has a limited range of acceptable entities, then when you add another such entity (you'll find yourself writing [:db/add "foo" :db/ident :my/ident]) you must do exactly the same thing you would do when you make a non-back-compat change to a vocabulary: bump the vocabulary version.

The second is to say that this is something we'll model in vocabulary and track in Mentat. Perhaps :db/type :db.type/closed-enum, and a way to write out the cases.

If you go this route you will need to validate in the transactor, record those enums before syncing values, decide whether the enum cases can shrink and/or grow… and you still haven't solved the problem, because if the enum set can't change backward-compatibly (and for cardinality it cannot), then the developer still needs to bump the version, or at least handle the case where Mentat complains that the remote timeline has a different enum set.

Ultimately you are bumping into the question of what to do when fundamental change occurs. My position is that we should support indicating, migrating, and detecting, but that we cannot transform all kinds of fundamental change into automatic change.

rnewman commented 6 years ago

By the way: whenever the set of supported types is changed (and there are several such changes on the list), we will need to bump the core vocabulary version. Remember that older clients might not even be able to represent those newer types — they might need a different SQLite schema!

Locking out clients will probably be an infrequent event, but we cannot eliminate it entirely.

mozilla / mentat

Core schema seems insufficient #756