solid / specification

Solid Technical Reports
https://solidproject.org/TR/
MIT License
494 stars 44 forks source link

Standardizing state changes in resources (history, undo, sync) #161

Open joepio opened 4 years ago

joepio commented 4 years ago

Solid specs standardize how to represent the current state of a resource (RDF, in some valid serialization format), but there is no part of the spec that describes how to store or share the deltas / changes in data / patches / transactions (I'm just calling them Deltas from now on).

Why store & standardize deltas

Having an append-only event log / ledger that describes every single mutation in a pod can provide some cool features:

One might argue that we don't need a standardized event system for most of these features - every solid server implementation could create their own way of dealing with versioning, for example.

However, I think standardizing this would improve data portability. If these events are standardized, the user can maintain undo / version history across Solid servers. And besides individual advantages, it would enable more powerful and performant data synchronization. Even very large resources could be updated incrementally, triple for triple.

And besides, since RDF is a relatively simple model, I think standardizing this will not be too complicated.

What the standard should define

What this means for clients

Currently, (most) Solid apps write to a pod by writing a full RDF resource. This works fine for smaller documents, but it becomes very inefficient and error prone when resources consist of more triples. Therefore, I think that clients should be able to send these state changes to their pod, and both the pod as the client app should be able to parse the delta and apply it to their RDF store.

Ways to standardize event logs

Some initiatives already exist that aim to standardize how deltas should be serialized and interpreted.

Some things to take in mind when considering (or designing) a delta standard:

RDF-Delta

This standard consists of two concepts: RDF Patch and the RDF Patch log. It introduces a new serialization format, similar to turtle, where you can add some letters before statements that encode for mutations. It also supports header items, e.g. to reference to previous commits.

An Apache Jena implemtation + CLI app already exist.

linked-delta

linked-delta is serialized in n-quads and uses the fourth column to semantically describe how a triple should be changed (e.g. update the value, add it, remove it). We created this and use it in our e-democracy application Argu to communicate state changes (when resource attributes change) between back-end and front-end.

The main benefit of this solution, is that it is light weight and does not require new serialization formats, and n-quads is the RDF serialization format that's the easiest to write a parser for. Since the fourth columns uses IRIs, the spec is inherently exetendible: any IRI can be added, which means that in the future we might come up with many other things than "add" or "replace". However, this might introduce complexity, since loaders (apps that playback the deltas) now might have to deal with unknown methods.

Some implementations exist: [Link-lib] (browser typescript), linked_rails (ruby on rails, server side),

  // This is how you can describe and process a linked-delta in a JS app:
  store.processDelta([
    new Statement(
      subject, // https://timbl.inrupt.net/profile/card#me
      predicate, // https://schema.org/firstName
      object, // "Tim"
      ld("replace"), // => http://purl.org/linked-delta/replace
    ),
  ]);

This spec does not (yet) standardize the level above a set of quads - and I do think it makes sense to standardize how we denote who created a delta, whether it's signed, when it's created, what the previous hash is (to make a cryptographically valid ledger).

Currenlty, the order in which statements appear in a linked-delta document do not have any semantic meaning, and there are rules that determine in what order a parser (loader?) should interpret all delta statements.

N3 Patches

Tim Berners-Lee mentioned N3 Patches during a meeting some time ago, as an alternative, but I failed to find more about this.

LD-Patch

LD-Patch is a W3C working group spec that also introduces a new serialization language.

SRARQL updates

SPARQL-Update supports INSERT and DELETE, so you could use these SPARQL Update strings to store deltas.

Using PROV / other reification methods

Maybe the right way to store changes is to express it in RDF, perhaps use the PROV ontology for this. This would of course eliminate the need for a new serialization format.

However, I feel like it should be trivial / really simple to convert these change statements into valid RDF.

Atomic Commits

see https://docs.atomicdata.dev/commits/intro.html

disclaimer: This is a design of my own.

It's a JSON based serialization of state changes, which allows for full traceability using cryptographic signatures. It's implemented and used in atomic server and atomic data browser. Only works with a strict subset of RDF.

TL;DR

Using deltas to communicate state changes is efficient and makes P2P state sharing easier. Storing deltas makes it easier to deal with backups, versioning, undo, and adding new query options. Various solutions exist, but perpahs we need something else.

Most importantly, we should pick one, and I'd love to hear your thoughts on this!

RubenVerborgh commented 4 years ago

Missing above seems to be the issue of blank nodes and canonicalization (see Aidan Hogan and others).

Tim Berners-Lee mentioned N3 Patches during a meeting some time ago, as an alternative, but I failed to find more about this.

Implemented by yours truly in https://github.com/solid/node-solid-server/blob/v5.2.2/lib/handlers/patch/n3-patch-parser.js (https://github.com/solid/node-solid-server/pull/516)

how to store or share the deltas

Can you maybe be a bit more precise about the problem we are solving? Because storage is not a Solid concern (the specs only govern exchange). Is this about PATCH?

Because versioning itself is just Memento (and that design is part of server architectures).

joepio commented 4 years ago

Can you maybe be a bit more precise about the problem we are solving? Because storage is not a Solid concern (the specs only govern exchange). Is this about PATCH?

Because versioning itself is just Memento (and that design is part of server architectures).

I'm mostly thinking about client-server (two way) communication, e.g. during collaborative document editing, but I think that standardized deltas can be useful in many contexts. And for many of these use cases, storing the deltas itself is important (auditability + P2P state replication = why git is awesome). Now I agree that we should not care for how these deltas should be stored (any solid server implementation can do whatever it likes), but providing a standard interface for accessing and appending these deltas is something that the spec maybe should cover.

RubenVerborgh commented 4 years ago

I'm mostly thinking about client-server (two way) communication, e.g. during collaborative document editing, but I think that standardized deltas can be useful in many contexts.

OK but then we should probably have collaborative editing as an issue/use case.

And for many of these use cases, storing the deltas itself is important

exposing; slight difference, but important in the Solid context, because the specs only govern the exchanges.

gsvarovsky commented 3 years ago

OK but then we should probably have collaborative editing as an issue/use case.

I've just done a scrape of related use-cases from Solid and w3, and written it up on the forum, and indeed, it's not expressed directly.

However https://github.com/solid/user-stories/issues/22: "As a developer, I want to be able to subscribe to a stream of pod events" supports this ticket in general.

TallTed commented 3 years ago

Using NQuads for the deltas, and using the context (fourth) column as an "action" indicator only works if you're applying these deltas to a single graph, i.e., that your target is not working with Named Graphs, which at least some Solid servers (will) do. Something to consider as this suggestion moves forward...

joepio commented 3 years ago

Using NQuads for the deltas, and using the context (fourth) column as an "action" indicator only works if you're applying these deltas to a single graph, i.e., that your target is not working with Named Graphs, which at least some Solid servers (will) do. Something to consider as this suggestion moves forward...

My colleague Thom suggested using a query parameter in the 'action' field, if you use named graphs.

joepio commented 2 years ago

I'm happy to see N3 Patch has been invented and added to the spec. This is an interesting alternative to the earlier mentioned existing specs. I'm still a bit sceptical to relying on the n3 serialization format, as it will be new to most developers and it can be quite hard to parse.

Anyway, If n3 patches are persisted (and named) by a server, it becomes possible to construct verisons / a history.