mozilla / mentat

UNMAINTAINED A persistent, relational store inspired by Datomic and DataScript.
https://mozilla.github.io/mentat/
Apache License 2.0
1.65k stars 115 forks source link

Please provide a glossary of mentat technical terms #669

Open rfk opened 6 years ago

rfk commented 6 years ago

In various slack discussions, I regularly come across terms that I can vaguely intuit the meaning of, but can't attach precise semantics to. Please provide a glossary of terms where they mean specific things in mentat. I can think of the following off the top of my head:

But I'll probably come up with more over the next few days and add them to this issue :-)

ncalexan commented 6 years ago

I'll make some notes here, and we can figure out a reasonable way to surface the copy more generally as we get closer to external consumers.

By vocabulary, we mean a programmatic layer of functionality on top of Mentat's schema primitives that supports creating and evolving applications. Mentat supports relatively rich datatypes (compared to SQL: for example, Mentat supports booleans, keywords, UUIDs, etc), relatively spare relationships (compared to SQL: for example, SQL supports complex primary and foreign keys, and multi-column uniqueness constraints), and relatively rich schema versioning (compared to SQL: SQLite, for example, exposes only a single 32-bit integer "user version"). Mentat's vocabulary layer implements that rich schema versioning. It allows to define an abstract set of related attributes (called a vocabulary definition), to ensure that a concrete store is configured with the given definition, and to evolve that definition forward atomically.

Smushing I'm not going to define quite yet, because it's a rather speculative term for something that we haven't yet settled on. For context, we're groping for the term that means "identify two datoms in two different stores". In the simplest case of a :db/unique :db.unique/identity attribute, it's clear that

[e :unique/attribute value]

and

[f :unique/attribute value]

should by "smushed", but it's not yet clear what should happen with, say, :db/cardinality :db.cardinality/one datoms. In some cases (say, ancestor relations like :mother-is or :father-is) the domain supports identifying mothers across stores, even in the absence of a unique-identity attribute. But in general this gets very complicated. And we're really not close to specifying "smushing" for content-aware merging applications, where the set of attributes is not a singleton. It's tricky -- and that's why I say that truly automated synchronization is still a research problem.

rnewman commented 6 years ago

I'd say, in brief:

rfk commented 6 years ago

Thanks!

Is "vocabulary" similar to a "schema" in relational db terminology? In what ways do they differ?

Is "smushing" only relevant in the context of syncing, or might it come up in edge-cases of a db on a single machine?

rnewman commented 6 years ago

A relational DB’s schema performs multiple roles:

There are multiple SQL schema that can represent the same domain model.

A Mentat vocabulary doesn’t specify anything about the on-disk structure of data. It’s also simpler, and there are fewer correct ways to skin the cat. But more importantly each vocabulary is composable with other vocabulary, because each attribute is scoped and each vocabulary is named and versioned, and the space of entities is shared.

Internally we have two concepts that we call a ‘schema’:

That is: Mentat internally has a ‘schema’, which it uses to understand inputs. Application code works with vocabularies.

rnewman commented 6 years ago

Smushing is usually a sync-related thing. The main reason for that is that local application code does something similar as it goes: upserting and lookup refs allow transacted data to find existing entities by reference. I see that as a kind of smushing, but without creating an interim entity.

You can imagine systems that merge entities in purely local code, too, but they’re not as common as simple CRUD apps.