Open ncalexan opened 6 years ago
There are a few ways to model ordered data.
rdf:List
and rdf:first
/rdf:rest
. Lists are superficially inefficient, but implementations can store them efficiently — that is, one can view the linked list format as a reification of a concept — and expose them to consumers efficiently, and Mentat could do the same. SPARQL gives rdf:_1
and similar special properties for querying. Syncing would need some thought. Retrieval of a list is, in a naïve store, recursive.
"I'm bookmark X
; the rest of the list is Y
".X
".[X, Y, Z]
".x friend a, b, c
is equivalent to x friend.0 a; friend.1 b; friend.2 c
. As with the above we have the same three problems: storage, representation, and semantics; this declares that the five-tuple can be extended to a six-tuple to add an index. If we can spare the bytes — and we can — this might be a good option.It's worth thinking for a moment what kinds of operations one might perform on ordered data, and how they feel.
Mentat makes assertion and retraction incremental and individually identified.
A replacement operation on a vector, as in option 4, removes that:
[:db/retract 123 :foo/children [124, 125]]
[:db/add 123 :foo/children [789, 456, 980]]
makes five (or more, depending on your perspective) changes at once. That obscures conflicts: if another client transacts
[:db/retract 123 :foo/children [124, 125]]
[:db/add 123 :foo/children [124, 125, 765]]
it's not clear whether they're appending 765
to the same conceptual list or constructing a new list with some of the same members. In the case of a conflict it's far from clear what the resulting list should be.
By contrast, an rdf:List
approach gives you two ways to handle this.
Let's model it in Mentat.
[{:db/ident :rdf/first
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/one}
{:db/ident :rdf/rest
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/one}
{:db/ident :rdf/nil}]
And our example vocabulary:
[{:db/ident :foo/thelist
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/one}]
Transact the initial data:
[{:db/ident :foo/example
:foo/thelist "list"}
{:db/id "list"
:db/ident :foo/myfirstlist
:rdf/first "124"
:rdf/rest "temp1"}
{:db/id "temp1"
:rdf/first "125"
:rdf/rest :rdf/nil}]
Query for it (ugly and inefficient, because we haven't built any query support for this yet, but it works):
.q [:find [?first ?second]
:where [:foo/example :foo/thelist ?list]
[?list :rdf/first ?first]
[?list :rdf/rest ?rest]
[?rest :rdf/first ?second]
[?rest :rdf/rest :rdf/nil]]
=>
| ?first | ?second |
--- ---
| 65540 | 65541 |
--- ---
When we reconstruct the list, we can decide whether we replace the original list or change it, and this choice can be influenced by our domain — perhaps a team's membership is always new (a "roster" constitutes a "team", with the team's roster changing each season), but the contents of a folder in a UI are mutated in place.
Here let's build a new list and swap over the :foo/thelist
attribute. (Yes, persistent data structures!)
[{:db/ident :foo/example
:foo/thelist "list"}
{:db/id "list"
:db/ident :foo/mysecondlist
:rdf/first "789"
:rdf/rest "temp1"}
{:db/id "temp1"
:rdf/first "456"
:rdf/rest "temp2"}
{:db/id "temp2"
:rdf/first "980"
:rdf/rest :rdf/nil}]
And query again:
.q [:find [?first ?second ?third]
:where [:foo/example :foo/thelist ?list]
[?list :rdf/first ?first]
[?list :rdf/rest ?rest1]
[?rest1 :rdf/first ?second]
[?rest1 :rdf/rest ?rest2]
[?rest2 :rdf/first ?third]
[?rest2 :rdf/rest :rdf/nil]]
=>
| ?first | ?second | ?third |
--- --- ---
| 65551 | 65550 | 65552 |
--- --- ---
If we projected ?list
here you'd see it's a new entity.
A conflict during a sync will be a conflict on the :foo/thelist
property of :foo/example
; all of the other entities are new. This reflects a real conflict in the domain: which of the two valid, internally consistent lists do we keep?
We can also reuse the existing list node and change its properties.
[{:db/id "list"
:db/ident :foo/mysecondlist ; The same node.
:rdf/first "789"
:rdf/rest "temp1"}
{:db/id "temp1"
:rdf/first "456"
:rdf/rest "temp2"}
{:db/id "temp2"
:rdf/first "980"
:rdf/rest :rdf/nil}]
Which one we choose affects the outcome of the append performed by the second client. With mutate-in-place a conflict will arise on :rdf/first
on a list node, or a change can be orphaned because of a simultaneous list swap.
These issues of change and identity are not new in this model; they're essential features of the domain that are exposed by it.
Datomic and Mentat are primarily set oriented, but there's some data that really wants to be ordered and the order is arbitrary. Datomic users tend to handle this with a "transaction function" that atomically updates order attributes during a transaction. (See, for example, https://augustl.com/blog/2013/ordering_cardinality_many_in_datomic/.) That approach is probably possible in Mentat, but it's not easy to arrange right now. This is mostly a TODO item for @rnewman to set down his thoughts on what we should do here.