Please provide a glossary of mentat technical terms

rfk commented 6 years ago

In various slack discussions, I regularly come across terms that I can vaguely intuit the meaning of, but can't attach precise semantics to. Please provide a glossary of terms where they mean specific things in mentat. I can think of the following off the top of my head:

Vocabulary
Smushing

But I'll probably come up with more over the next few days and add them to this issue :-)

ncalexan commented 6 years ago

I'll make some notes here, and we can figure out a reasonable way to surface the copy more generally as we get closer to external consumers.

By vocabulary, we mean a programmatic layer of functionality on top of Mentat's schema primitives that supports creating and evolving applications. Mentat supports relatively rich datatypes (compared to SQL: for example, Mentat supports booleans, keywords, UUIDs, etc), relatively spare relationships (compared to SQL: for example, SQL supports complex primary and foreign keys, and multi-column uniqueness constraints), and relatively rich schema versioning (compared to SQL: SQLite, for example, exposes only a single 32-bit integer "user version"). Mentat's vocabulary layer implements that rich schema versioning. It allows to define an abstract set of related attributes (called a vocabulary definition), to ensure that a concrete store is configured with the given definition, and to evolve that definition forward atomically.

Smushing I'm not going to define quite yet, because it's a rather speculative term for something that we haven't yet settled on. For context, we're groping for the term that means "identify two datoms in two different stores". In the simplest case of a :db/unique :db.unique/identity attribute, it's clear that

[e :unique/attribute value]

and

[f :unique/attribute value]

should by "smushed", but it's not yet clear what should happen with, say, :db/cardinality :db.cardinality/one datoms. In some cases (say, ancestor relations like :mother-is or :father-is) the domain supports identifying mothers across stores, even in the absence of a unique-identity attribute. But in general this gets very complicated. And we're really not close to specifying "smushing" for content-aware merging applications, where the set of attributes is not a singleton. It's tricky -- and that's why I say that truly automated synchronization is still a research problem.

rnewman commented 6 years ago

I'd say, in brief:

A vocabulary is — just like in the real world — a named collection of related 'words'. You might have a vocabulary named :org.mozilla/visits with 'words' (attributes) :visit/date and :visit/page. Each attribute belongs to a single vocabulary. Mentat provides a mechanism for versioning, evolving, and reusing vocabularies. A store can describe entities with multiple vocabularies.
Smushing is the general process of merging two entities that some process of deduction has decided are the same. Those entities might need to be smushed because they were created in different places with the same unique attributes; because they share some intermediate entity that itself was smushed; or as part of some application-level deduction (e.g. history visit compound uniqueness). One can also imagine Mentat's tempid resolution process to be a kind of smushing.

rfk commented 6 years ago

Thanks!

Is "vocabulary" similar to a "schema" in relational db terminology? In what ways do they differ?

Is "smushing" only relevant in the context of syncing, or might it come up in edge-cases of a db on a single machine?

rnewman commented 6 years ago

A relational DB’s schema performs multiple roles:

It defines the entities in the domain and their identifiers, sometimes implicitly, in the form of table names, column names, and primary key constraints.
It defines relationships between those entities (foreign keys and constraints).
It defines structural layout and data types.
… and more besides.

There are multiple SQL schema that can represent the same domain model.

A Mentat vocabulary doesn’t specify anything about the on-disk structure of data. It’s also simpler, and there are fewer correct ways to skin the cat. But more importantly each vocabulary is composable with other vocabulary, because each attribute is scoped and each vocabulary is named and versioned, and the space of entities is shared.

Internally we have two concepts that we call a ‘schema’:

The actual SQLite disk schema. Mentat is a SQLite application!
The complete set of attributes present in the store, as well as the idents used to name particular entities. There’s a struct named Schema that allows Mentat to find things like cardinalities and data types for an attribute by name, without having to query the store.

That is: Mentat internally has a ‘schema’, which it uses to understand inputs. Application code works with vocabularies.

rnewman commented 6 years ago

Smushing is usually a sync-related thing. The main reason for that is that local application code does something similar as it goes: upserting and lookup refs allow transacted data to find existing entities by reference. I see that as a kind of smushing, but without creating an interim entity.

You can imagine systems that merge entities in purely local code, too, but they’re not as common as simple CRUD apps.

mozilla / mentat

Please provide a glossary of mentat technical terms #669